Choosing OCR Software: Converting Text in Technical Drawings

Updated Jul 8, 2019
Comparison of poor quality raster text with vector text string

A key benefit of vector images over their raster counterparts is their ability to include editable text. The text in a raster image is nothing more than a collection of pixels. As such, it’s indistinguishable on a technical level from the remainder of the image. A vector image, however, is capable of storing text as a separate, editable entity. This means that, if you have a technical drawing containing text, then converting it from raster to vector is the logical choice.

In this article, we’ll run through everything you need to know about converting text in technical drawings. We’ll start with the reasons why it’s so important to convert your text. We’ll also go into detail about why this process can be quite tricky, and alert you to some of the problems you may encounter. After that, we’ll move onto how to perform the conversion, and provide some pro tips to help you ensure that everything passes off without a hitch. We’ll even show you the best OCR software to use to convert your text. Let’s get started!


Composition of raster text

A floorplan saved in .TIFF format with the labels "Bedroom" and "Bathroom"

This raster version of a floorplan is not editable nor scalable.

Pixels are the building blocks that make up the entirety of a raster image. Each pixel is no more and no less than a square of color. As such, there is no particular structure to a raster image, and nothing to distinguish one part of an image from another.

With this in mind, any text that features in a raster image is, in a technical sense, nothing more than pixels. Anyone who’s ever had to edit a raster image will be well aware of the problems this causes.

It is impossible, for example, to go back and edit the text in your raster image. If you’re lucky, you might be able to use a paint brush or eraser tool to white out the text in your image—already a cumbersome process. In some cases, however, even this might not be possible.

To put a long story short, raster text is simply unsuitable for editing. Worse still, it comes with a whole host of other common raster image issues. These include pixelation when zooming or scaling an image, and the inability to attach data to your text.


Composition of vector text

Vector floorplan

Meanwhile, you can edit this vector version of a floorplan in CAD software.

Vector text, on the other hand, is an entirely different beast. Unlike the pixels which make up a raster image, all elements within a vector image are distinct.

Each element is mathematically defined, with a fixed relative position within the image. As a result, each element appears the same at any scale, making it possible to zoom into an image without losing quality.

Another upshot of this is that it is possible to easily edit each element within an image. This includes vector text. Say, for example, you noticed a typo, or simply wished to add more information to the text within your image. As long as you’re using vector text, this is a cinch.

It’s also possible to attach information to each element within a vector image. This means that you can add additional specifications to objects and text.


Raster vs. vector text in technical drawings

For many purposes, raster text works perfectly well. Problems arise, however, when you have a technical drawing. To use these drawings to their full potential, they must be fully editable in CAD software. As such, if you’ve got a technical drawing, you need it to be in a vector format.

Why you shouldn’t use raster text… …and why you should use vector text
You can’t edit raster text
There is no structure to raster text
It easily becomes pixelated upon zooming or scaling
You can’t attach any additional data to it
It’s easy to edit vector text in CAD software
Vector text is a mathematically defined object
 It retains its quality at any scale
 You can attach specifications to the text object

The list of pros and cons lays the choice bare: vector text is simply better for technical drawings. However, while this advice is all well and good when creating a new technical drawing, it doesn’t solve the issue of what to do when you already have a technical drawing saved as a raster image.

If this is the case, then you’ll need to move onto converting your raster text to vector text.


How to convert text in technical drawings

Converting an image of an electrical schematic containing text to vector text stirngs

Editing technical drawings requires CAD software—and to use CAD software, you’ll need to be using a vector image. The key reasons for this are that vector images are editable and versatile.

If you were creating an electrical schematic, for example, then CAD software would enable you to define its components and materials. This, in turn, would make it much easier to produce the design when it comes to the manufacturing stage.

If you’re familiar with converting raster images to vector, you’ll know that the process, which is known as vectorization, is technically highly complex. The reason why vectorization is so tricky comes down to the fact that a raster image has no structure.

This lack of structure means that it’s difficult for software to detect what exactly the image contains. The human eye may be able to tell, for example, that an image contains the word “CAD” in black on a white background. To a machine, however, it’s just a collection of black, white and gray pixels.

In the past, the only way around this was to overlay a vector layer on top of your raster image, and manually type new vector text in place of the old raster text. This isn’t exactly the most elegant solution. Luckily, OCR software now makes it possible to automatically convert text in technical drawings.


What is OCR?

Optical Character Recognition, or OCR, is the technology which lets software detect raster text and convert it to vector text. As such, it’s OCR that enables a computer to convert text in technical drawings.

For OCR to work, it needs to be able to recognize certain letterforms. This is a fairly easy task for the human eye, but a very tricky one for a computer. Part of the reason for this is the sheer difference between how text appears in different fonts. Take, for example, the six letter ‘g’s in the image below.

Lowercase letter g in six fonts

As a human, it’s easy to tell all six of these forms represent the same letter. A computer, however, has no additional information to work off, making it hard to discern what exactly the image represents.

This is where OCR software comes in. OCR software ‘learns’ the shapes of each letter, enabling it to recognize them when they appear in an image. Initially, OCR could only recognize the forms of a single font: OCR-A, which you may recognize on your checkbook. Over time, the technology learned to recognize other common fonts, such as Times New Roman and Helvetica.

Today’s OCR software, however, has much more advanced capabilities. That’s because, instead of trying to spot a letterform in its entirety, it works on the basis of feature detection. For example, OCR may recognize that one straight horizontal line lying perpendicular atop a straight vertical line forms a capital T. This enables OCR to go beyond recognizing specific fonts and allows it to take an ‘omnifont’ approach.


Exploded text vs. vector text

Some raster-to-vector conversion software can’t actually recognize text within a raster image. Instead, they’ll convert the text into vector lines and curves. This is called ‘exploded text’, and, unfortunately, it is useless for practical purposes. You can see an example of exploded text below:

Exploded text

In this image, each letter actually comprises multiple vector lines. Put simply, it isn’t really text. When you convert your image with Scan2CAD, however, you’ll get something that looks more like this:

What you can see above is a text string. It’s actual vector text which you can edit by typing—just as you would edit text in a word processor.

In certain circumstances, you might find yourself with exploded text, either due to using sub-par conversion software or having received a file from a colleague containing it. Meanwhile, standard OCR software doesn’t concern itself with exploded text, focusing only on converting from raster to vector.

Fortunately, Scan2CAD is no standard OCR software. In fact, it has the capability to convert exploded text into text strings. This means you can undo any inadvertent errors and truly get the most out of your technical drawing.


Getting the best conversion results

Great OCR software is an essential tool to have in your arsenal when you convert text in technical drawings. However, there are still extra steps you should take to improve the quality of your conversion.

Choose the right raster file type

The first is to choose the right raster file type to begin with. This is of particular importance if you’re scanning in a technical drawing from paper, such as a floor plan. When scanning, Scan2CAD recommends choosing the TIFF file format. TIFF stands ahead of its rival raster formats due to its use of lossless compression. This means that the image won’t lose quality and will keep vital detail. This file format also allows you to include tags. You therefore have the option to save the file as a GeoTIFF, which you can convert for CAD and GIS.

JPG, on the other hand, is a file format to avoid. The reason for this is its use of lossy compression. This compression method compromises image quality in favour of a smaller file size. In certain circumstances, this makes sense—for example, when you need to save thousands of photos on a smartphone. For CAD purposes, however, it’s a bad trade-off.

Unfortunately, it isn’t always possible to decide on a drawing’s file type yourself. In such cases, you don’t have any control over the quality of the image when you open it. You can, however, improve the image’s quality so that it is more suitable for conversion.

Clean up your text

To get the best possible results when you convert text in technical drawings, you’ll need to ensure that the raster text you’re working with is up to scratch. After all, even the best OCR software out there can’t decipher gibberish. You know what they say: garbage in, garbage out!

Here at Scan2CAD, we created a Raster Text Quality Checklist to help you stay on the straight and narrow. Before attempting any conversion, you should ensure that any text characters in your raster image are:

  • Easily legible
  • Do not touch each other
  • Do not touch other elements within your drawing
  • Are not at different orientations
  • Are in a font that Scan2CAD can recognize

We also have a few pro tips at hand to help you turn poor quality text into something that’s ready for vectorization.

  • If you plan on using a non-standard font, be sure to train Scan2CAD’s neural networks first. Proper testing can help ensure that Scan2CAD recognizes the characters present in your image.
  • Sometimes, characters may touch. In these instances, it is difficult for OCR software to tell where one character ends and the next begins. Scan2CAD’s Split tool can help you to separate these characters for better results.
  • In some cases, your text may simply not be legible. If this is the case, even a human might struggle to tell what the text really says—let alone OCR software! Your best bet here may be to simply type over the text.

Use the right software

On the left is text converted using an online converter. On the right is text converted by Scan2CAD.

On the left is text converted using an online converter. On the right is text converted by Scan2CAD. The difference is clear.

This is something we can’t stress enough: choosing the right conversion software is make-or-break for your vector text.

As we’ve noted, some raster-to-vector software simply can’t tell the difference between text and other elements within an image. Using such software will provide you with near-useless exploded text—and a resultant headache.

Poor-quality conversion results are a common pitfall of online file converters. Unfortunately, this is far from the worst problem they can cause. In fact, using an online file converter can endanger both the privacy of your intellectual property and the security of your system. Putting all of this at risk for the sake of sub-par vector text just isn’t worth it.

The smart move here is to use dedicated software for the conversion of technical drawings to vector images. Scan2CAD, therefore, is the natural choice. It excels at converting text and images, with over 20 years at the cutting edge of vectorization. With its neural networks able to understand text of all varieties, it’s one step ahead of the game.


How to convert text in technical drawings using Scan2CAD

With so much technical know-how involved in the creation of great OCR software, you might expect the conversion process itself to be similarly tricky. Thankfully, you’d be wrong: it’s easy as pie to convert text in technical drawings with Scan2CAD.

In the following video we convert a technical drawing which contains text. Notice how the appropriate elements are converted using OCR and the other elements in the image are vectorized. For this we use a process called object identification. After conversion we can directly edit the text strings.

View video transcript

In this video, we’ll be converting this technical drawing, which contains text and a lot of other objects, which you may see in a electrical schematic or other technical drawings. And we’ll be converting this into vector. What we want to do is recognize text in the image, this is a raster image. We want to recognize text in here using OCR, but we also want to convert the other objects which are not characters to their appropriate vector elements. To do that, we need to use Scan2CAD’s object recognition, sending elements in the image, that look like text to our OCR and elements that don’t look like text to our vectorization. First, we’ll quickly run a threshold to make sure the image is suitable for conversion. Okay, I’ve just set the threshold level somewhere around there, where it’s okay. I’m not going to continue with any other raster effects. Raster effects are tools for cleaning up the image to make it suitable for conversion. What we’ll do is just go straight into a vectorization now, and I’m gonna choose the technical vectorization, meaning we want to Scan2CAD’s objects recognition and we’ll choose electrical as the default vectorization options.

This video isn’t intended to be a tutorial for all the options within Scan2CAD, so we won’t go over the object identification options and so on. What we’ll do is turn on vectorize and OCR, and go to the OCR box, turn on the image in the preview, so we can select from image and select the character size that we need to run the OCR on. Okay, I’m happy with everything as it is. We don’t need to enable vertical, ’cause there’s no vertical text in this drawing. So we’ll click Run. This runs the vectorization and the OCR. And it’s now complete. I’m reasonably happy with that by looking at the preview, so I’m gonna click okay to save that to my canvas. What we’re viewing right now is both the raster image and the vector image. I’ll go to view, just out of the view of this video and click View Vector Colors. That shows us a type of vector by their color, red represents vector lines, blue over here, you can see represents the vector circle objects. We have pink representing vector arc objects, but we also have text. So, I’m gonna turn off the raster image, so we can just view the vector and zoom in. Let’s have a look.

So we can see the vector text now, which is fully editable, if we wanted to, you can just click the Edit Text and edit accordingly and compare it to the raster image. It looks very good and I’m happy with the conversion.

 

scan2cad advert for free trial