The technology that enables computers to recognize text–Optical Character Recognition—is constantly evolving, expanding the parameters of what we can convert. It now boasts the ability to convert even handwritten text. This is an impressive feat—human handwriting is, of course, the most random and changeable of fonts. Not only does it differ from person to person, but the handwriting of one individual will not be identical each time they write. That’s a lot of variations for a computer to attempt to detect!
Any kind of raster text is tricky to convert, but handwritten characters take things to a whole new level of complexity. In contrast to established fonts, the latter rarely contain regular or predictable patterns—which is basically what computers are searching for when you instruct them to find text within an image. This means that, if you’re looking to convert handwritten text, you need to use very sophisticated technology. Achieving the desired results depends both on selecting the right software and ensuring your original image is optimized for conversion.
This article lays out the extent to which it is realistically possible to convert handwritten text using OCR. We explore the potential and limits of current technology, and provide advice on how to get the most out of your handwritten work in a CAD context.
Table of Contents
- What is OCR?
- How does OCR work?
- Why convert handwritten text?
- How to ensure successful conversion
- How to convert handwritten text
What is OCR?
Optical Character Recognition, or OCR, is the technology that allows software to recognize text within an image. It thus performs a vital stage in the process of converting raster text to vector text. In fact, OCR’s ability to extract text from graphics or documents makes it an incredibly useful tool across a wide range of industries. Consider security cameras that can pick up car number plates, or digital architectural blueprints containing editable annotations—neither would be possible without OCR.
It comes in particularly handy in the world of CAD. Anyone who’s attempted to manually trace an image with text in order to convert it to a vector format knows that getting a computer to do the job is much easier! Until fairly recently, though, automatic tracing was not recommended if the image to be converted included handwritten text. A computer simply cannot compete with the human eye’s ability to recognize letters and numbers.
With OCR technology, however, certain software can now be trained to recognize a wide range of fonts and convert them accordingly.
How does OCR work?
OCR uses more than one approach when it comes to recognising text. The most basic way the technology distinguishes characters from pictures is through a technique known as pattern recognition. This involves a computer comparing objects within an image to letters already stored within its software. In other words, the software is equipped with a library of characters and the computer will search for the same patterns within your work and recognize when it finds a match.
The problem with pattern recognition, at least for our purposes, is that it cannot detect handwritten text. No one writes in Times New Roman, after all. Thankfully, as the technology has become more sophisticated, it increasingly relies on a different tactic known as feature extraction.
Rather than trying to recognize full letters, feature extraction occurs when a computer detects certain features (lines and loops, for example) and understands that they signify a character. The letter ‘H’, for instance, will be picked up by the software whenever it detects two vertical lines joined in the middle by a smaller, horizontal line.
This technique means that a computer’s ability to recognize characters is not constrained to a limited number of fonts. From here, it can be trained to detect even handwritten text.
Once software is able to perform feature extraction, it may be trained to detect features in handwritten text. Using neural networks, conversion programs like Scan2CAD can train OCR to recognize features from text that the user provides. Once it has learned to recognize a certain style of text from examples you have input, you can train the software to detect the same writing in different pieces of work.
If OCR is trained to recognize a particular individual’s handwriting (perhaps someone who creates technical drawings), it opens up a whole world of possibilities in terms of what they can do with their work.
Why convert handwritten text?
If you’re starting out with handwritten text (either scanned into your computer or written on a tablet), it will be in a raster format. Converting the image to a vector format will make your work more versatile and allow edits to be made by yourself and others.
Problems with raster images
Raster images are comprised of pixels. This means that if you attempt to zoom into or rescale the image you’re working on, the overall quality will suffer. In a professional context this is not exactly ideal. Take technical drawings, for example—your work may appear blurry when people attempt to zoom in to inspect certain details. Plus, it’s useful to be able to resize an image for different purposes. This is not possible with a raster file without compromising its overall quality.
Vector images, on the other hand, are made up of objects. Each object (be it an arc, path, line, etc.) is defined by a mathematical equation. As every individual element has its own fixed relative position, re-scaling or zooming will not affect the overall quality of the image.
Editing your images with CAD software
Vector files are the ultimate choice if you are looking to edit your work with CAD or CNC software. The objects that comprise a vector image can be edited individually, allowing for a high level of accuracy in the process. Raster files are not compatible with CAD software and even the most basic adjustments will have an impact on the entire image.
Anyone working in an industry that uses CAD requires vector images to get the most out of their projects. If you are working on an architectural design that includes useful handwritten annotations, for example, you want your collaborators to be able to both read and amend the text where necessary. This level of precision and control is not possible with a raster image.
How to ensure successful conversion
Converting handwritten text, though possible, is by no means a simple task. You need to be realistic about the kind of characters a computer is going to be able to detect. To optimize your chances of success, you need to make sure your original image is viable. If you’re looking for professional results, the image needs to be cleaned up as much as possible. Consult our raster text quality checklist to ensure you have completed this stage.
OCR software still has its limitations. If you find that your handwritten text cannot be converted automatically, it may be best to simply type over it with vector text.
The biggest issue that is flagged up by conversion software is image quality. If you want good quality results, you need to start with a good quality image. Computers are incredibly powerful, but they’re not miracle workers.
If the original file is of a low resolution, for instance, the software will have a hard enough time picking up any details—let alone the handwritten text! Your image should be clean, crisp and contain no overlapping text. It should go without saying, therefore, that joined up handwriting will be impossible for a computer to detect.
There is actually a font specifically designed to be read by OCR technology, handily named OCR-A. It’s commonly used for banking purposes—you’ll recognize it as the font on credit cards and cheques.
Generally speaking, for OCR purposes, established fonts like Arial are a suitable choice. This obviously isn’t realistic for what we’re covering here, but it’s a good rule of thumb to remember for general OCR practices. At least try to ensure your handwriting is as neat, consistent and clear as it can be.
As you’ll be using a non-standard font (handwriting), make use of technology like the aforementioned neural networks. If the relevant software is already trained to recognize your writing, you stand a higher chance of success when it comes to conversion.
The right software
Repeat after me: not all conversion software is created equal! This is especially apparent when it comes to converting text, be it handwritten or typed. The result you’re looking for is a text string. If you use a cheap online converter, you may end up with what is known as exploded text.
The latter is not in fact text, but a collection of vector shapes that are basically impossible to edit. Scan2CAD, meanwhile, will ensure that conversion produces text strings—text that is rendered correctly, presented logically and can be edited easily.
How to convert handwritten text
Once your raster image has been cleaned up and you’ve run through the checklist, it’s time to convert. Scan2CAD allows you to do this with handwritten text, and it works in two stages.
The first stage is font training which, as we’ve previously mentioned, involves using neural networks to train the software to recognize your writing. This is a fairly complex process, but don’t worry—the computer is doing most of the work!
In short, you’ll need to create a new training set, add your text examples, train the neural network to recognize them and then test that it has learned the new training set. For detailed instructions head over to the ‘How to train Scan2CAD to recognize a font‘ section of the user manual.
Now that your handwriting is detectable by the software, you can carry out the conversion following the instructions under the convert a raster image with text section.
Once your image is saved in a vector format, you can start really making the most of your work!