⛔️ This article is for a legacy version of Scan2CAD. See our updated tutorials. ⛔️


Request a Personalized Demo

Learn how to accurately convert your designs with Scan2CAD

  • Join thousands of happy customers worldwide

What is a training set?


A training set is a set of example characters from which a neural network learns to recognize a font.


How many example characters do you need?

You should include several examples of each character in a training set to help the neural network learn to recognize character variations.

The number of examples of each character you include in a training set depends on the variation in the font you are trying to recognize. If your raster images are near-perfect quality, the characters will be very uniform and you will need fewer examples. If your raster images are poor quality, you will need more examples to cover the variations in the characters.

As a rule of thumb, it is a good idea to start with two examples of each character in a training set. After you have trained the neural network, test it and see if any characters have not been correctly recognized. Add these incorrectly recognized characters to the training set as examples and then train a new neural network using the modified training set.

A training set can contain a maximum of 32,000 example characters representing a maximum of 128 characters


Characters that you can’t include in a training set

You cannot include the following characters in a training set:


. period
, comma
; semi-colon
: colon


? question mark


You also cannot include non-contiguous characters. These are characters made up of more than one part where each part is fully surrounded by white space – for example the copyright symbol ©. To include non-contiguous characters like the letters i and j include the main body of the character only, not the dot.


Have questions on this topic? Talk to us