So what does this library do?
It does discrete energy minimization on loopy graphs. This is an important topic in computer vision, since it can be used for segmentation, stereo vision and optical flows. When you read "CRF" in a computer vision paper and there is no learning involved, it just means they minimized an energy function (probably using gco). Often this is just some form of "smoothing" but a lot more can be done with it.
Let's talk about pairwise energy functions. These functions take the form
E = \sum v_i(y_i) + \sum w_ij(y_i, w_j)
with v_i and w_ij are some fixed parameters and y_i are the multinomial variables over which we want to minimize this energy. Typically the i run over all pixels in an image, and w_ij is only nonzero for adjacent pixels.
In general, think of v and w as tables (of size n_labels or n_labels x n_labels) for each node resp. edge in the graph. In simple models, these are the same tables for all nodes / edges, in more complicated ones they vary.
In general this is NP hard, but there are many interesting cases, in which this problem can (somewhat surprisingly) be solved efficiently and exactly.
If y_i are binary and the energy is submodular, the exact solution can be found efficiently.
If the y_i are not binary, but the energy fulfills a related property, it is possible to efficiently find good, if not optimal, solutions. See the seminal paper of Yuri Boykov, Olga Veksler and Ramin Zabih Fast approximate energy minimisation via graph cuts.
This is where gco comes in: it implements two move-making algorithms, alpha-expansion and alpha-beta-swaps, that solve a series of binary problems to obtain a solution to a multi-label problem.
Going back to the applications I mentioned above, these labels could be different depths in stereo vision, different directions of movements in optical flow, or different object classes in semantic segmentation.
There are several ways to specify the energy for gco and I don't want to go into to much detail here. Have a look at the README if you are interested.
For the moment, I have ported two ways to Python (but the rest should come soon). One is for general graphs, specified by listing all edges, the other one is for 2D grid graphs (images).
In all cases, the unary potentials (the v_i above) are just given as one number for each vertex (pixel in the grid graph case) and state.
In the simple case that I wrapped for Python, the pairwise potentials (the w_ij)
are fixed over the whole graph and only depend on the combination of labels y_i and y_j.
This contains the case of the simple Pott's potential, where w_ij is one if y_i = y_j and zero otherwise.
If you want to use the code, you need to download the original gco library and my wrappers. There is one thing you have to keep in mind when working with gco, though: all potentials are expected to be integers, so you need to round them!
Here are some examples (available at my git repo):
And finally, a not-as-trivial-looking example:
The second image is translated in x direction and the best fit for each pixel is found. The image on the top right gives the best translation for each individual pixel. The bottom images show the results obtained with alpha-expansion and Pott's potentials (right) and a 1d topology (left).
This is not really how you would do stereo vision (the potentials I used are very naive) but I feel it is a nice example (about 10 lines in Python).
I hope some people find this helpful an there will be more computer vision code in Python soon :)