Raster Text Quality Check-list for OCR text recognition
Scan2CAD has a capability for converting raster text to vector text using OCR. When you convert raster text using OCR, the vector text is proper editable text rather than a series of uneditable lines and arcs.
Scan2CAD’s OCR recognizes raster text where the following conditions are met:
- The raster text is easily legible.
- The raster text characters do not touch each other.
- The raster text characters do not touch other drawing elements.
- The raster text characters are not at different orientations.
- The raster text characters are in a font that Scan2CAD can recognize.
To ensure that your text meets these conditions, work through the following Raster Text Quality Checklist.
First, place your cursor over a piece of text on your image. Press M to Magnify. Press M again and again until your image is highly magnified. Or, zoom in by scrolling your mouse wheel forward. To zoom out again, click .
Is the text easily legible?
If you cannot read the text easily, as in the examples below, Scan2CAD won’t be able to read it either.
If the text is not easily legible, the only remedy is to start off with a better quality raster image.
If this is not possible, you will have to retype the text manually. You can either do this in Scan2CAD or in your CAD program after you have imported the converted file into it.
You may want to erase areas of very poor quality text from the raster image so that these areas are not vectorized to lines and arcs.
Are the characters touching?
Scan2CAD cannot recognize characters that touch other characters, even if the characters are only connected by a few pixels:
If the characters touch, try selecting OCR > Settings > Split before doing OCR recognition. When this option is selected Scan2CAD will attempt to split and identify touching characters.
This will improve text recognition on some raster images, however on others it may result in a lot of “junk characters” being recognized. This is because characters that touch are often very poor quality and are unrecognizable even after splitting. For example, the characters in the example above have “bled”. Not only has this caused them to touch each other but it has also filled in the “A”. This means that the “A” is no longer typical of an “A” and Scan2CAD may have difficulty recognizing it even if it is not touching other characters.
You can often improve the quality of an image that has bled by rescanning it in grayscale and thresholding it (see the Scanning Checklist).
Is the text written over other drawing elements?
If text is written over drawing elements or is attached to underlining or boxes as in the examples below, Scan2CAD won’t be able to recognize it.
Is the text at more than one orientation?
Where text at one orientation is intermingled with text at another orientation it is virtually impossible to recognize all the text.
Can Scan2CAD recognize the font?
By default, Scan2CAD can only recognize text that has been written using a standard font such as the font in the example below.
It may not recognize other fonts well. It may also fail to recognize standard fonts that are narrower or wider than normal or that are italicized.
If Scan2CAD’s default text recognition cannot recognize a font well and you have a lot of images containing that font, you can train Scan2CAD to recognize the font (Pro version only). You can do this if the font characters are consistent and do not touch. For example:
Scan2CAD’s default text recognition will recognize this font but it will not recognize it optimally because the font is narrower than normal. You could train Scan2CAD to recognize this font well.
Scan2CAD’s default text recognition will recognize this font very poorly because it is italicized and hand written. However, because the characters are clear and do not touch you could train Scan2CAD to recognize it.
Scan2CAD’s default text recognition will recognize this font very poorly because it is hand written and because the characters touch each other. You could not train Scan2CAD to recognize this font because the characters touch.
Despite the fact that the quality of this text is poor you could train Scan2CAD to recognize it because the characters are consistent and do not touch each other.
It takes a few hours to train Scan2CAD to recognize a font but it can significantly improve text recognition.