OCR Guide: Converting Handwritten Text

Updated Jul 8, 2019

The technology that enables computers to recognize text–Optical Character Recognition—is constantly evolving, expanding the parameters of what we can convert. It now boasts the ability to convert even handwritten text. This is an impressive feat—human handwriting is, of course, the most random and changeable of fonts. Not only does it differ from person to person, but the handwriting of one individual will not be identical each time they write. That’s a lot of variations for a computer to attempt to detect!

Any kind of raster text is tricky to convert, but handwritten characters take things to a whole new level of complexity. In contrast to established fonts, the latter rarely contain regular or predictable patterns—which is basically what computers are searching for when you instruct them to find text within an image. This means that, if you’re looking to convert handwritten text, you need to use very sophisticated technology. Achieving the desired results depends both on selecting the right software and ensuring your original image is optimized for conversion.

This article lays out the extent to which it is realistically possible to convert handwritten text using OCR. We explore the potential and limits of current technology, and provide advice on how to get the most out of your handwritten work in a CAD context. 

Comparing handwritten text vs OCR vector text

The results of using OCR on handwritten text in Scan2CAD


Table of Contents


What is OCR?

Optical Character Recognition (OCR) in Scan2CAD

Optical Character Recognition, or OCR, is the technology that allows software to recognize text within an image. It thus performs a vital stage in the process of converting raster text to vector text. In fact, OCR’s ability to extract text from graphics or documents makes it an incredibly useful tool across a wide range of industries. Consider security cameras that can pick up car number plates, or digital architectural blueprints containing editable annotations—neither would be possible without OCR.

It comes in particularly handy in the world of CAD. Anyone who’s attempted to manually trace an image with text in order to convert it to a vector format knows that getting a computer to do the job is much easier! Until fairly recently, though, automatic tracing was not recommended if the image to be converted included handwritten text. A computer simply cannot compete with the human eye’s ability to recognize letters and numbers.

With OCR technology, however, certain software can now be trained to recognize a wide range of fonts and convert them accordingly.


How does OCR work?

OCR uses more than one approach when it comes to recognising text. The most basic way the technology distinguishes characters from pictures is through a technique known as pattern recognition. This involves a computer comparing objects within an image to letters already stored within its software. In other words, the software is equipped with a library of characters and the computer will search for the same patterns within your work and recognize when it finds a match.

OCR-A Font Preview

The computer refers to its own catalog of characters to carry out pattern recognition

The problem with pattern recognition, at least for our purposes, is that it cannot detect handwritten text. No one writes in Times New Roman, after all. Thankfully, as the technology has become more sophisticated, it increasingly relies on a different tactic known as feature extraction.

Rather than trying to recognize full letters, feature extraction occurs when a computer detects certain features (lines and loops, for example) and understands that they signify a character. The letter ‘H’, for instance, will be picked up by the software whenever it detects two vertical lines joined in the middle by a smaller, horizontal line.

This technique means that a computer’s ability to recognize characters is not constrained to a limited number of fonts. From here, it can be trained to detect even handwritten text. 

Neural networks

Once software is able to perform feature extraction, it may be trained to detect features in handwritten text. Using neural networks, conversion programs like Scan2CAD can train OCR to recognize features from text that the user provides. Once it has learned to recognize a certain style of text from examples you have input, you can train the software to detect the same writing in different pieces of work.

If OCR is trained to recognize a particular individual’s handwriting (perhaps someone who creates technical drawings), it opens up a whole world of possibilities in terms of what they can do with their work.


Why convert handwritten text?

Fountain pen writing on lined paper

If you’re starting out with handwritten text (either scanned into your computer or written on a tablet), it will be in a raster format. Converting the image to a vector format will make your work more versatile and allow edits to be made by yourself and others.

Problems with raster images

Quality issues

Raster images are comprised of pixels. This means that if you attempt to zoom into or rescale the image you’re working on, the overall quality will suffer. In a professional context this is not exactly ideal. Take technical drawings, for example—your work may appear blurry when people attempt to zoom in to inspect certain details. Plus, it’s useful to be able to resize an image for different purposes. This is not possible with a raster file without compromising its overall quality.

Vector images, on the other hand, are made up of objects. Each object (be it an arc, path, line,  etc.) is defined by a mathematical equation. As every individual element has its own fixed relative position, re-scaling or zooming will not affect the overall quality of the image. 

Editing your images with CAD software

Vector files are the ultimate choice if you are looking to edit your work with CAD or CNC software. The objects that comprise a vector image can be edited individually, allowing for a high level of accuracy in the process. Raster files are not compatible with CAD software and even the most basic adjustments will have an impact on the entire image. 

Anyone working in an industry that uses CAD requires vector images to get the most out of their projects. If you are working on an architectural design that includes useful handwritten annotations, for example, you want your collaborators to be able to both read and amend the text where necessary. This level of precision and control is not possible with a raster image.


How to ensure successful conversion

Converting handwritten text, though possible, is by no means a simple task. You need to be realistic about the kind of characters a computer is going to be able to detect. To optimize your chances of success, you need to make sure your original image is viable. If you’re looking for professional results, the image needs to be cleaned up as much as possible. Consult our raster text quality checklist to ensure you have completed this stage.

Poor image quality for raster to vector conversion

Raster images with any of these problems are unlikely to convert successfully.

OCR software still has its limitations. If you find that your handwritten text cannot be converted automatically, it may be best to simply type over it with vector text. 

Image quality

The biggest issue that is flagged up by conversion software is image quality. If you want good quality results, you need to start with a good quality image. Computers are incredibly powerful, but they’re not miracle workers.

If the original file is of a low resolution, for instance, the software will have a hard enough time picking up any details—let alone the handwritten text! Your image should be clean, crisp and contain no overlapping text. It should go without saying, therefore, that joined up handwriting will be impossible for a computer to detect.

Font

There is actually a font specifically designed to be read by OCR technology, handily named OCR-A. It’s commonly used for banking purposes—you’ll recognize it as the font on credit cards and cheques.

Digits in OCR-A font

Digits in OCR-A font

Generally speaking, for OCR purposes, established fonts like Arial are a suitable choice. This obviously isn’t realistic for what we’re covering here, but it’s a good rule of thumb to remember for general OCR practices. At least try to ensure your handwriting is as neat, consistent and clear as it can be.

As you’ll be using a non-standard font (handwriting), make use of technology like the aforementioned neural networks. If the relevant software is already trained to recognize your writing, you stand a higher chance of success when it comes to conversion. 

The right software

Example of vector text strings. This is the desired result of vectorization because they can be edited and displayed correctly.

Repeat after me: not all conversion software is created equal! This is especially apparent when it comes to converting text, be it handwritten or typed. The result you’re looking for is a text string. If you use a cheap online converter, you may end up with what is known as exploded text.

The latter is not in fact text, but a collection of vector shapes that are basically impossible to edit. Scan2CAD, meanwhile, will ensure that conversion produces text strings—text that is rendered correctly, presented logically and can be edited easily. 


How to convert handwritten text

Once your raster image has been cleaned up and you’ve run through the checklist, it’s time to convert. Scan2CAD allows you to do this with handwritten text, and it works in two stages.

The first stage is font training which, as we’ve previously mentioned, involves using neural networks to train the software to recognize your writing. This is a fairly complex process, but don’t worry—the computer is doing most of the work!

In short, you’ll need to create a new training set, add your text examples, train the neural network to recognize them and then test that it has learned the new training set. For detailed instructions head over to the ‘How to train Scan2CAD to recognize a font‘ section of the user manual. 

Now that your handwriting is detectable by the software, you can carry out the conversion following the instructions under the convert a raster image with text section. 

Once your image is saved in a vector format, you can start really making the most of your work! 

Video: Converting handwritten text with Scan2CAD

View video transcript

In this video, we will be converting handwritten text in this image using OCR to editable vector text strings. So we’re going to be doing this with Scan2CAD. If you don’t know what OCR is, it’s Optical Character Recognition, and that’s the process of converting text in an image, such as this image – we can see the pixels here – to editable vector text strings.

This is different to vectorizing the image. I’ll show you, first, an example of vectorizing if we just rush through the settings here in Scan2CAD to show you a simple conversion. We’ve vectorized the image now, and we’re viewing a vector design that can be edited like so. But these are not editable vector text strings, they’re just polygons in the shape of the original text. So let’s close the vector file we created, and instead, we are going to use OCR.

So we’ll go to the OCR option, go to OCR settings. I’ll turn on the image in the preview to see the image here. I’ll select the character size from the image just by drawing like so. Let’s increase that a little bit to 60. We don’t need to enable vertical text since both the text in this image are horizontal. We have English as the language. And let’s change the drawing type or document type to text. If we had other elements in this image that we wanted to convert and not use OCR on it, for example, if this is a technical drawing that contained lines and arcs, and circles as well as text then we’d use technical. But since this is 100% text, we can use our option. Click Run to run the process, and it’s complete. And let’s save the results of the canvas by clicking OK. And we’re viewing the vector and raster image here. Let’s go to View Vector Colors so we can view the text in pink. Let’s turn off the raster image for a second underneath.

So we’ve got hand-printed script all correctly recognized. We have hand-printed script down the bottom here, but there’s a couple of little elements we’d want to change. We can do so manually like so. Seems to be an extra space character because in the handwriting, there’s more space than expected. Click OK to save the results. So we can see here, as we compare the raster and the vector, how it’s done pretty well at recognizing this text. This is because the handwriting is isolated, so we have characters that aren’t touching. It’s quite eligible. And generally, it’s quite a suitable type of handwriting for OCR.

scan2cad advert for free trial