PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric
Space, Qi, Yi, Su, Guibas; 2017 - Summary
author: | ishank-arora |
score: | 9 / 10 |
What is the core idea?
- PointNet was a pioneer in studying deep learning on point sets
- However, by design, PointNet does not capture local structures which limits generalizability to complex scenes
- PointNet++ is a hierarchical neural network that applies PointNet recursively on a nested partitioning of the input point set.
- This network is able to learn local features
- This is similar to a CNN
How is it realized (technically)?
- Hierarchical strucuture
- PointNet uses a single max pooling operation, PointNet++ builds a hierarchichal grouping of points and progressively abstract larger and larger local regions along the hierarchy.
- Hierarchichal structutre has a number of set abstraction levels.
- Here is where the set of points are processed and abstracted to produce a net set with fewer elements.
- Done in three layers:
- Sampling Layer
- Selects a set of points from input points - defines centroids of local regions
- Done using iterative farthest point sampling (FPS)
- Better coverage of the entire point set than random sampling
- In contrast to CNN
- CNN scans the vector space agnostic to data distribution but FPS is data dependent
- Grouping Layer
- Constructs local region sets by finding “neighboring” points around the centroids
- In CNN, a local region is of a pixel is some pixels within a certain Manhattan distance (kernel size) of the pixel
- Here the neighbourhood of a point is defied by metric distance
- PointNet Layer
- Mini-PointNet to encode local region patterns into feature vectors
- The coordinates of points in a local region are translated into a local frame relative to centroid
- Allows capturing of point-to-point relations in local region
- Sampling Layer
- Non-Uniform Sampling Density
- PointNet++ intelligentally extracts multiple scales of local patterns and combines them at each abstraction level
- Two types of density adaptive layers:
- Multi-scale grouping (MSG):
- Simply concatenate features at different scales to form a multi-scale feature
- Train the network to learn an optimised strategy to combine the multi-scale features
- Multi-resolution grouping (MRG):
- MSG is computationally expensive
- Avoids expensive computation while still presevering ability to adaptively aggregate information according to distributional properties of points
How well does the paper perform?
- Experiments on 4 datasets - MNIST, ModelNet40, SHREC15, ScanNet
TL;DR
- An extension of PointNet which allows for better learning of local features for better recognition of fine-grained patterns and generalizability
- Learns by using increasing contextual scales
- Achieve state-of-the-art performance by using two novel set abstraction layers to intelligentaly aggregate multi-scale information