Getting computers to recognize text within images can be a tricky business. Machines find it very difficult to separate text from other objects because they, of course, do not interpret letters and numbers in the same way as humans all elements are simply a collection of pixels. To get around this problem, CAD programs largely rely on techniques like pattern recognition and feature extraction to detect text within pictures. These processes are made possible by OCR—the technology that allows computers to extract text from images.
An OCR API can be used to add OCR capabilities to your software. OCR API’s are in high demand because converting text is a common requirement for image processing software yet developing the capabilities yourself would be an extremely time consuming and complex project.
There are many OCR API options out there to choose from. Big companies like Amazon and Microsoft provide these services, as well as lesser-known companies that offer free versions. With so many factors to weigh up, it can be hard to know which one you should go for. For this reason, we’ve compared a few of the best looking options avaliable. Let’s see how they measured up…
Table of contents
- What is OCR?
- OCR API
- What to look for in OCR APIs
- The best options compared
- OCR API—which to go for?
What is OCR?
OCR, or Optical Character Recognition, is the technology that enables software to recognize text in an image. Within a CAD context, you may know it as the feature that enables you to convert raster text into vector text—thus allowing you to edit said text with CAD software.
Initially, OCR relied on the process of pattern recognition to distinguish text from other image elements. The technology would compare objects in an image to a library of figures it already had stored. When it found a match, it would know to regard it as text. This technique was fairly limited, as OCR then only stored well-established fonts, like Times New Roman or its very own font, OCR-A.
As the technology behind OCR has improved, it has increasingly relied on the technique known as feature extraction. This involves a computer associating various features presented in a certain combination with particular letters or figures. For example, a vertical line topped with a smaller, horizontal line is understood to represent the letter ‘T’. Once OCR software can perform feature extraction, it can even be trained to recognize certain handwriting.
API stands for Application Programming Interface. It’s a fairly general term that can cover a wide range of technologies. You can consider them a tool that allows a distinct piece of software to interact with an established application or program, with the purpose of providing certain methods or properties that the main application lacks.
So, in the case of OCR, an API could be used to detect and extract text from an image that you provide. This is really useful for people working with software that doesn’t offer OCR capabilities. The OCR APIs can return their work with text that is editable or better displayed.
What to look for in OCR APIs
There are certain qualities that you should always look out for when shopping around for an OCR API. The most important feature is that the technology should be able to extract data (letters and figures) correctly and with precision. This might sound obvious, but you’d be amazed by how many applications fail to cut the mustard.
If you’re already a pro at converting raster files to vector formats, you’ll know all about the pitfalls of exploded text. In short, when using OCR to extract text from an image, the result you’re looking for is text strings. This means the characters are rendered and presented correctly and can be easily edited.
Software that lacks precision and accuracy may send you back a file containing exploded text instead. This is not really text, but rather a group of vector shapes that will be almost impossible to edit. Selecting the right OCR API is vital if you want to avoid these annoying flaws.
Outside of capability and precision, price and ease of use are aspects to consider before making any software selections. Sometimes you can find efficient services for free; other times, it’s worth shelling out for a top quality product. This is why it’s important to make an informed choice before separating with your cash.
As for ease of use, it’s important to make sure you are providing the software with images that are optimized for OCR.
Issues that can stump OCR technology
Yes, OCR technology is very sophisticated and its capabilities increasingly impressive, but you do need to meet it half way. As is the case with converting raster images to vector images, when using OCR you should make sure your original image is of a high quality.
You can’t expect the technology to be able to detect text in an image that is out of focus and blurry. Similarly, OCR may struggle to separate characters that are very similar (like ‘S’ and ‘5’) or presented in a confusing manner. For the best results, make sure you’re providing a strong starting point.
The best options compared
There are plenty of OCR API options to be found on the internet. From the offerings of the major tech giants, to free online converters, we compare 5 of the best below.
1 Microsoft Computer Vision
Microsoft Computer Vision, part of the Microsoft Azure platform, offers so much more than OCR capabilities. This software has the ability to analyze video, recognize celebrities and read handwritten text within images (though the latter is still in preview stage).
If basic OCR is all you’re looking for, the free tier option is quite generous. You’re allowed up to 5,000 transactions per month. With the power of Microsoft behind it, you can expect accurate results and a wide range of special features. You can pick from two OCR endpoints: image file or URL.
|Free → $2.50 per every 1000 transactions
|Analyzes and extract text from images, recognizes celebrities and landmarks, video analysis
|Visit the website
2 Google Cloud Vision
Once again, this is a service that offers much more than OCR. Google Cloud Vision can recognize a wide range of text (including handwritten), detect faces, landmarks and is even able to extract logos. This OCR API benefits from having the power of Google image search behind it, providing a huge library of brand logos from which the software can perform feature recognition.
The free package allows for up to 1,000 free API calls per month. So, not quite as generous as Microsoft, but with this API you are treated to a slightly larger range of extra capabilities.
|Free → $1.50 per every 1000 transactions
|Label detection, handwriting recognition, range of languages supported
|Visit the website
3 Amazon Rekognition
This platform is divided into two main features: Rekognition video and Rekognition image. Amazon’s OCR is referred to as ‘Text in Image’, part of the Rekognition image suite.
Amazon Rekognition boasts the ability to locate and extract text from both natural and on-screen scenes. Once analyzed, the text will be returned with a detected text label and a confidence score. The free tier lets you analyze up to 5,000 images per month.
The technology behind this platform is sophisticated and there is a high emphasis on customer service, should you run into any problems. However, this API option comes up short in terms of extra capabilities—that is, unless you’re working with videos. Another downside is that the technology only works with images and videos stored in Amazon S3.
|Free → $1 per every 1000 transactions
|Text recognition, real-time analysis, activity detection
|Visit the website
This API has the capabilities to convert scanned images, photos of documents and receipts into text. Over 90 different languages are supported and it even has the ability to deskew and rotate text that has been captured at an angle (don’t push this last one too far, though—cleaning up your images is still vital).
Though it doesn’t come with the backing of a giant tech name, hold on to your hats, because Cloudmersive provides an incredibly generous free version. Sign up for an account and you’ll be allowed up to 50,000 calls per month. Do bear in mind, though, that this tool is designed for simple text recognition and extraction. Don’t expect many extra features.
|Free → $499.99/month (business package)
|OCR, document and data conversion, image recognition and processing
|Visit the website
5 Free OCR
This API sells itself as a simple way to get text extracted from images and PDF documents. Despite its name, not all versions of this software come without a price. Nevertheless, the free tier is fairly generous—allowing for 25,000 requests a month.
Free OCR is probably a good option for people who only have very basic OCR needs. It’s a no frills affair and basically does what it says on the tin. We’d recommend using the free tier to test out the quality of the results. Be wary of free online services, they don’t always provide professional results. On the other hand, such sites are certainly worth a look if you’re just playing around with images for fun.
|Free → $49.95/month
|Locates and extracts text from images and PDF documents
|Visit the website
OCR API—which to go for?
As you can see, there are plenty of OCR API’s available on the web. Which one you go for will largely depend on how many images you need to process and the extent to which extra capabilities (like face recognition) will be useful to you. It’s probably worth trying out a few free versions before you commit to anything.
Working with CAD and don’t have time to experiment with different applications? Rather than signing up for an OCR API, you can rely on software like Scan2CAD to serve all of your needs in one place. Basically, it provides the whole conversion package, including OCR and a full raster and vector editing sweet—saving you from having to fiddle about with different software providers.
No need to take our word for it, sign up for a free trial below and see why Scan2CAD is the ultimate vectorization software!