Behind-the-scenes: Raster-to-vector conversion algorithms

Updated May 4, 2019
Raster to Vector Conversion Algorithms

Scan2CAD is the ultimate vectorization solution, allowing users to convert from raster to vector with just a few clicks. This process is useful to a variety of different users in a number of fields. If you’re dealing with technical drawings, maps, and schema, then vector files are necessary for analysis. Meanwhile, if you’re in the business of design and manufacturing, then you need vector files that you can work with on computer-assisted drawing (CAD) programs, CNC machines, and so on.

You can either convert from raster to vector manually or using an automated computer algorithm. When converting manually, an artist would need to trace over the raster image using drawing software, a tablet, and a stylus (or even a mouse, if they were incredibly proficient!). Alternatively, a computer programmer can write an algorithm: a set of rules or instructions that a computer follows in order to perform a calculation. Read on to learn how this process works.

To a human, vectorization appears to be a single process. From a computer’s point of view, however, it is a combination of several smaller algorithms, each of which controls a specific part of the process. The software uses these algorithms to analyze the raster image, before creating a vector representation of it. The procedure involves three main stages: pre-processingprocessing, and post-processing.

Step 1: Pre-processing

The purpose of pre-processing is, quite simply, to prepare the raster image for vectorization. The type of pre-processing work that needs to be done depends on the type and quality of the input image. Here are a few techniques that the vectorization software employs to produce optimum vector output:

  • Reduce color. Vectorization works best when the initial raster image has as few colors as possible. To achieve this, grayscale images are binarized, and all gray elements in the image are converted to black or white pixels. Meanwhile, the software reduces the number of colors present in a color image to the minimum possible.
  • Reduce noise. You may not be able to notice the effects of noise when viewing your raster image, but it can have a serious impact on the quality of your vector output. There are many reasons why noise appears in a raster image—especially in scanned images—from the low quality of your original sketch, to paper defects, non-optimal threshold settings, or non-uniform lighting in your scanner. Vectorization software, meanwhile, removes dust, speckles and unwanted spots. Noise pixels are identified by comparing them with the neighbouring pixels—shapes and objects are structured, whereas noise pixels are random and usually smaller in size. Filters use rules to accept or reject the pixel; smarter algorithms can analyze the local pixel neighbourhood and define the filter dynamically.
  • Increase threshold. Thresholding involves dividing the shades of gray in an image into black and white pixels. This creates a sharp distinction between a white background and black foreground, making the image easier to vectorize.
Thresholding - Raster to Vector Conversion

Check out the difference threshold levels make to your image! You can include as much or as little detail as you want.

Step 2: Processing

This stage is where the conversion from raster to vector happens. First, the program finds the lines in the raster image, where each line is essentially a chain of pixels. There are two main approaches to “find the line”:

Thinning-based methods

This involves eroding the image down to its “skeleton”, which is a line drawing that is only one pixel thick. If the raster line is too thick, the software may wrongly transform it into several parallel lines. There are different mathematical algorithms that can be used to thin images: Rosenfeld thinning, Stentiford thinning, Zhang Suen thinning, edge detection and canny edge detection. For the less technically savvy, this is like “peeling an onion”; an iterative process of thinning the image until no pixel can be removed without altering the shape.

Contour-based methods

This method extracts image contours, matches the contours and then finds the medial line between a pair of matching contours. However, this method is unable to capture correct lines at intersections. Using this method, the user defines a fixed interval at which they want to see isolines or contours.

Vectorization - Contour-matching vs skeletonization

Here’s a comparison between both thinning-based and contour-based methods. Image source:

Again, the type of method we’d use depends on the type of image. For example, thinning-based methods are very sensitive to noise. Contour-based methods, meanwhile, are more noise-tolerant, but rely on complex matching schemes. Many programs also apply two-step vectorization procedures that combine a few methods. There are also other methods such as orthogonal zig-zag, run-length encoding and sparse pixel tracking.

After the program “finds the line”, the program approximates the lines found into a set of vectors. It creates a vector-based representation using elements like text, polygons, circles, arcs, Bezier curves and lines (including dotted lines, dash-dot lines, arrows and polylines).

Step 3: Post-processing

In this stage, the software seeks to analyze and interpret the vector data. The goal here is to remove noise from the vector model, recognize objects, and recover entities from vector data. There are various goals to achieve in this step, including:

  • Filling gaps
  • Classifying vectors
  • Eliminating false branches
  • Rectifying right-angled corners
  • Finding the best position for junction points
  • Simplifying vectors using polygonal approximation
  • Checking for duplicates and removing/merging identical vectors that are lying on top of each other
  • Lengthening vectors and combining multiple vectors into a single vector
Convert Lines into polylines

One example of post-processing: the software converts two lines into a polyline

There you have it! When you convert an image from raster to vector, there’s a whole lot going on in the background—all of which is controlled by pre-programmed algorithms. From a user’s point of view, though, it only takes a single mouse click.

scan2cad advert for free trial