Summary

Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving, Wang, Chao, Garg, Hariharan, Campbell, Weinberger; 2018 - Summary

author:	biofizzatreya
score:	9 / 10

TODO: Summarize the paper: The paper proposes a method to convert convolutional network based object detection in stereo to LIDAR type point cloud representations. They do this because objects far away are smaller and traditional conv-nets fail to detect them properly.

How is it realized (technically)? The algorithm first uses a pair of stereoscopic images to first construct a depth map of the image using the following equation.

Then, this depth map is used to calculate the x,y and z coordinates and converted into a point cloud representation.

The point-cloud is called a pseudo-LIDAR signal. The pseudo-LIDAR representation along with the monocular images are fed into 3d-object detection pipelines.

How well does the paper perform?

The paper does not develop any neural network architecture, instead it applies existing 3d object detection architectures on the pseudo-LIDAR point-cloud data. They evaluate their approach on the KITTI dataset. The pseudo-LIDAR is also back projected to LIDAR data to compare the accuracy.

What interesting variants are explored? The paper also attempts to detect pedestrians and cyclists which is a much harder task given the small size of the images of pedestrians and cyclists compared to cars. In this case pseduo-LIDAR has an accuracy of 0.5 compared to 0.7 in cars.

TL;DR

A method to convert images to LIDAR like point clouds
Pseudo-LIDAR brings 3d object detection to the level of LIDAR data
Combined with monocular images, makes 3d object detections less expensive