author: | biofizzatreya |
score: | 9 / 10 |
TODO: Summarize the paper: The paper proposes a method to convert convolutional network based object detection in stereo to LIDAR type point cloud representations. They do this because objects far away are smaller and traditional conv-nets fail to detect them properly.
- How is it realized (technically)? The algorithm first uses a pair of stereoscopic images to first construct a depth map of the image using the following equation.
Then, this depth map is used to calculate the x,y and z coordinates and converted into a point cloud representation.
The point-cloud is called a pseudo-LIDAR signal. The pseudo-LIDAR representation along with the monocular images are fed into 3d-object detection pipelines.
- How well does the paper perform?
The paper does not develop any neural network architecture, instead it applies existing 3d object detection architectures on the pseudo-LIDAR point-cloud data. They evaluate their approach on the KITTI dataset. The pseudo-LIDAR is also back projected to LIDAR data to compare the accuracy.
- What interesting variants are explored? The paper also attempts to detect pedestrians and cyclists which is a much harder task given the small size of the images of pedestrians and cyclists compared to cars. In this case pseduo-LIDAR has an accuracy of 0.5 compared to 0.7 in cars.
TL;DR
- A method to convert images to LIDAR like point clouds
- Pseudo-LIDAR brings 3d object detection to the level of LIDAR data
- Combined with monocular images, makes 3d object detections less expensive