author: | liyanc |
score: | 6 / 10 |
- What is the core idea?
Point cloud representation learning suffers from poor topology-preservation and permutation variances. In order to take advantage of the current CNN works, the authors propose a learned $\chi$-transformation for both weighting and permuting the features, which is named as $\chi$-Conv.
- How is it realized (technically)?
$\chi$-Conv operator
- Project to a local frame
- Project the local coordinates to a local feature
- Concatenate local feature with input feature
- Learn a K x K matrix from local coordiantes, namely $\chi$-transform
- Apply $\chi$ on concatenated features, therefore, weighting and permuting
- Finally convolute with kernel K
Implementation details are illustrated in following figures.
Permutation invariance and orientation invariance
Since the neighboring points are projected into a local frame and the reweighting matrix is learned on input, the combined $\chi$-transformation handles the orientation variance by reorienting the patch. Additionally, the permutation is explicitly rearanged by the $\chi$-transformation, it’s supposed to be permutation invariant as well.
Putting together
Raw points are grouped by their centers and processed layerwisely. If it’s a classification task, the network is constructed in a contrastive manner. On the other hand, a segmentation network would be constructed in a first constrastive and then expanding manner (hourglass). The following figure shows the architecture.
- How well does the paper perform? The authors perform experiments on both classification and segmentation. The classification task runs on ModelNet40 and ScanNet with the results shown below, which is on par with the SOTA.
The segmentation task runs on ShapeNet parts, S3DIS, ScanNet, with the results shown below, which is leading the SOTA.
- What interesting variants are explored? The authors ablate the $\chi$-transformation and analyze the effects on the feature separation with T-SNE as shown below. It’s clear that the proposed transformation improves the feature separation.
TL;DR
- Group points wrt to centers and project them in to the local frame
- Learn a permuation and reweighting transformation on the local framed points
- Build hierachical models like CNN