Keep Building

Novelty is overrated

Nov 22, 2014

Binarization Part 3

The Maximum Flow problem is well-studied and there are very efficient algorithms to solve it. It turns out to be equivalent to finding the Minimum Cut of a network as well. If we can map the binarization problem to either of these problems, the result should be fast and accurate. Let's look at what the problems are and then see how to map it.

Maximum Flow

Suppose you have a network of pipes in a water system. The pipes may be different sizes (water capacities) and connect to each other at various junctions. Suppose you have one water supplier, called the source. And somewhere else you have one customer who wants as much water as possible, called the sink. The question is, given a particular configuration of pipes and junctions, what is the most water you can supply to the customer over the network?

A simple flow network

In this example, the maximum flow possible through the network is 7 gallons per second.

Minimum Cut

Let's say you are a secret agent, trying to prevent the villain from escaping to safety. The villain will try to take a train from a starting city to their safehouse. Since the villain is wily, you know that if there is any path to safety, they will find it. But you only have so many resources and each train line you shut down has a cost. Shutting down all the train lines is far too expensive, so what is the cheapest set of train lines to shut down to guarantee success?

These two problems happen to be completely equivalent. The water source is the villain's starting city. The water sink is the villain's destination. The size of a pipe is equivalent to the cost of shutting down a train line. Once you map the one problem to the other, the total flow of water you can get through the water system is equal to the cost of trapping the villain.

A map of escape routes

In order to keep the villain away from safety at minimal cost, you have to cut off three routes whose cost adds up to $7 million. This is not a coincidence. When the flow is maximized, the water goes through pipes which collectively form a bottleneck. And those pipes, when removed, partition the graph at minimum cost.

Binarization

We can turn the binarization problem into an instance of the mincut problem by constructing a graph from the pixels of a picture. Then when we run the mincut algorithm, some of the pixels will end up on the side of the source/villain. The rest will be on the side of the customer/escape. We set one group to be white and the other to be black.

A lattice for pixels

Our main graph will form a lattice where every pixel has an edge to the four adjacent pixels. If the two pixels are in different groups, then the edge will have to be cut and this imposes a penalty. The bigger the weight, the more penalty is assessed when the two are in different groups and so the more likely they are to be in the same group.

Then we connect every pixel to both the source and sink, but vary the weights. Here is a simple way to do it. The brighter the pixel is, the more weight we assign to its link to the source and the less weight we assign its link to the sink. So bright pixels are biased toward one side and dark pixels towards the other.

The combination of these two kinds of weights helps ensure that contiguous regions of darker pixels are all tagged the same and helps reduce the chance of random speckles. If you want to see this method in action, take a look here.

Howe's binarization algorithm builds on this basic system. It builds a graph like I have described here and runs mincut on it. The tricky bit is the methods it uses for determining the weights between adjacent pixels and between the pixels and the source/sink.

Nov 21, 2014

Binarization Part 2

If we confine ourselves to binarizing documents, we want to find out every pixel that is ink and mark those as black. All the other pixels are the page background and can be marked as white.

Even this formulation papers over ambiguity. At the edge of every letter, there will be pixels, each of which cover an area which is partially covered in ink. Those pixels will be an intermediate value in the greyscale image between the ink and the page. This is one way in which we are losing information when we go to a binary image.

Before going into Howe's algorithm, I want to outline the two most common kinds of binarization algorithms. There are many variants on each of these methods.

Global Thresholds

One class of methods involves coming up with a single global threshold for the whole image. Any pixels lighter than this threshold become white and any pixel darker than the threshold become black. There are various ways to decide the threshold. You can analyze the distribution of pixel brightness and try to find a point between the cluster of page pixels and the cluster of ink pixels.

The big disadvantage is that it only works well when the page is extremely evenly lit. If one part of the page is darker than another, then a single threshold will cause similar pixels to be classified differently. Letters on one part of the page can look bolder than letters on another part. In the worst case, entire letters or features can disappear because the threshold that might work one place won't work everywhere.

Adaptive Thresholds

While I am pretty good at evenly lighting my books when scanning, I still see this kind of binarization artifact when using global thresholds. So I looked into alternatives and most of these alternatives are adaptive thresholding techniques. Adaptive thresholding means that you calculate a potentially different threshold for each pixel based on the neighborhood it is in.

This is a great advantage when dealing with uneven light. If the top part of the page is a bit brighter, then it's threshold will be a bit higher. These algorithms tend to be slower and they also tend to introduce more black speckles into the result. But some of the best algorithms currently available use an adaptive threshold.

Other Methods

Howe's algorithm uses neither kind of technique. It works by mapping the binarization problem to the maxflow/mincut problem. I will talk about how this works next time.