We present an approach for identifying a set of candidate objects in a given image. This set of candidates can be used for object recognition, segmentation, and other object-based image parsing tasks. To generate the proposals, we identify critical level sets in geodesic distance transforms computed for seeds placed in the image. The seeds are placed by specially trained classifiers that are optimized to discover objects. Experiments demonstrate that the presented approach achieves significantly higher accuracy than alternative approaches, at a fraction of the computational cost.

Code

You’ll need to download both the code and the additional data, which includes pre-trained boundary detectors for Sketch Tokens and Structured Forests. The readme contains instructions on how to compile and run the code. If you find a bug please feel free to contact me. However please don’t contact me to get help compiling or running the code.

Changes

v1.0: Initial Release

v1.1: Added a matlab wrapper, made it easier to use learned seeds and masks from matlab and c++, and added a function to compute bounding boxes from proposals.

v1.2: Added evaluation code for COCO and seed proposals (proposals just containing the seeds themselves). The seed proposals help for both COCO and VOC in terms of segmentation. If you’re interested in just bounding boxes you probably don’t want to use them, as most small bounding boxes are labeled as difficult.

v1.3: Fixed compilation issues on older system. Python 2.7 should now work too.

Matlab

After getting a few requests for Matlab binaries, here they are. I compiled them using Matlab 2014a, but at least under linux it should also work with older matlab versions, as long as you use “LD_PRELOAD=’/usr/lib/x86_64-linux-gnu/libstdc++.so.6’ matlab” to load a recent c++ standard library.

More comparisons

I had several people ask me how GOP compares to SCG/MCG. I also added in the new numbers incliding seed proposals (gop v1.2). So here is the comparison:

|Method | # prop. | ABO | Covering | 50%-recall | 70%-recall | Time|

| — |

|CPMC | 646 | 0.703 | 0.850 | 0.784 | 0.609 | 252s|

|Cat-Ind OP | 1536 | 0.718 | 0.840 | 0.820 | 0.624 | 119s|

|Selective Search | 4374 | 0.735 | 0.786 | 0.891 | 0.597 | 2.6s|

|SCG | 2125 | 0.754 | 0.835 | 0.870 | 0.663 | 5s|

|MCG | 5158 | 0.807 | 0.868 | 0.921 | 0.772 | 30s|

|MCG (best 2200 per image) | 2199 | 0.785 | 0.861 | 0.896 | 0.720 | 30s|

|Baseline GOP (130,5) | 653 | 0.712 | 0.812 | 0.833 | 0.622 | 0.6s|

|Baseline GOP (150,7) | 1090 | 0.727 | 0.828 | 0.847 | 0.644 | 0.65s|

|Baseline GOP (200,10) | 2089 | 0.744 | 0.843 | 0.867 | 0.673 | 0.9s|

|Baseline GOP (300,15) | 3958 | 0.756 | 0.849 | 0.881 | 0.699 | 1.2s|

|Learned GOP (140,4) | 652 | 0.720 | 0.815 | 0.844 | 0.632 | 1.0s|

|Learned GOP (160,6) | 1199 | 0.741 | 0.835 | 0.865 | 0.673 | 1.1s|

|Learned GOP (180,9) | 2286 | 0.756 | 0.852 | 0.877 | 0.699 | 1.4s|

|Learned GOP (200,15) | 4186 | 0.766 | 0.858 | 0.889 | 0.715 | 1.7s|

|Baseline GOP (v1.2) (130,5) | 780 | 0.723 | 0.812 | 0.850 | 0.631 | 0.6s|

|Baseline GOP (v1.2) (150,7) | 1237 | 0.741 | 0.828 | 0.870 | 0.657 | 0.65s|

|Baseline GOP (v1.2) (200,10) | 2281 | 0.759 | 0.843 | 0.892 | 0.688 | 0.9s|

|Baseline GOP (v1.2) (300,15) | 4242 | 0.771 | 0.849 | 0.910 | 0.711 | 1.2s|

|Learned GOP (v1.2) (140,4) | 754 | 0.731 | 0.815 | 0.865 | 0.640 | 1.0s|

|Learned GOP (v1.2) (160,6) | 1284 | 0.751 | 0.836 | 0.882 | 0.684 | 1.1s|

|Learned GOP (v1.2) (180,9) | 2319 | 0.767 | 0.851 | 0.891 | 0.710 | 1.4s|

|Learned GOP (v1.2) (200,15) | 4104 | 0.777 | 0.859 | 0.903 | 0.725 | 1.7s|

I also got some request to compare to edge boxes or BING, so here is that comparison (with 70% edge boxes):

| VUS (2000 windows) | Linear | Log |

| — |

| BING | 0.278 | 0.189 |

| Objectness | 0.323 | 0.225 |

| Edge Boxes | 0.526 | 0.320 |

| Randomized Prim | 0.511 | 0.274 |

| Selective Search | 0.528 | 0.301 |

| GOP | 0.546 | 0.310 |