Free Supervision from Video Games

Philipp Krähenbühl

[paper] [code]

Data

The data provided is for research and educational use only. Commercial use is prohibited. If you use the data, please buy the game(s).

About the data

Both train and test data are split into roughly 30s continuous clips. For each frame (recorded or not), a *_state.json file contains from basic information about the frame (including the player position, heading, control signal, weather, …). We record all image modalities at 6 FPS. All modalities are stored as compressed images (png or webp):

images and albedo are stored directly as color images (lossy webp or lossless png).
The segmentation map contains both instance and semantic segmentation. The R channel corresponds to the object type, while GB correspond to a 16 bit integer identifying the object id. The id is persistent across time (it tracks objects).
The flow image is a 24bit color image, the first 12 bits correspond to the horizontal (x/u) component, the second 12 bits correspond to the vertical (y/v) component. Convert the 12 bit integer to the actual flow value use x / 4. - 512. The flow range is -512 .. 512 with four subpixel accuracy measures (0.25, 0.5, 0.75). If the flow falls out of the range it is clipped (happens in less than 0.1% of all pixels). A flow of 0 (as the 12bit number) means the flow is not defined.
The disparity image is a 24bit color image. Convert the 24 bit integer to the actual disparity (1/depth) value use x * 8192. The disparity is currently clipped at 7 (and is by definition greater than 0). A disparity of 0 means that there was likely no object drawn at that location.

Code to read those image will be released soon.

Overfitting to the test data

Please don’t train on the test data! Out of scientific curiosity I collected an additional test set, slated to be release in 2021 (or whenever the dataset becomes irrelevant). I hope to benchmark the top methods on that heldout dataset at that time. Feel free to overfit as you see fit…