[review] Cross-Domain and Cross-Modal Knowledge Distillation in Domain Adaptation for 3D Semantic Segmentation (MM 2022)
Cross-Domain and Cross-Modal Knowledge Distillation in Domain Adaptation for 3D Semantic Segmentation (MM 2022)
Overview of the Paper
This paper addresses the problem of Unsupervised Domain Adaptation (UDA) for 3D Semantic Segmentation. Traditional methods attempt to bridge the domain gap between source and target datasets using adversarial learning, but these approaches suffer from unstable training, sensitivity to hyperparameters, and limited effectiveness.
To overcome these issues, the paper proposes Dual-Cross, a novel framework that incorporates:
- Cross-Domain Knowledge Distillation (CDKD) – Helps the model better perceive the target domain while training on source data.
- Cross-Modal Knowledge Distillation (CMKD) – Enhances interaction between 2D images and 3D point clouds, leveraging their complementary nature.
- Multi-Modal Style Transfer (MMST) – Converts source data into the target domain style to improve adaptation.
Problems and Solutions
1. Problem: Limitations of Existing Domain Adaptation Approaches
- Traditional UDA methods rely on adversarial learning to align domain features.
- However, adversarial learning is unstable, difficult to train, and sensitive to hyperparameters.
- 3D point clouds have varying densities, making simple feature alignment ineffective.
Solution
- Cross-Domain Knowledge Distillation (CDKD):
- Introduces a target-aware teacher network that learns target domain characteristics.
- When training the student model with source data, teacher predictions guide adaptation to the target domain.
- Effect:
- More stable and effective learning compared to adversarial learning.
- Reduces domain gap, allowing the model to retain high accuracy in the target domain.
2. Problem: Lack of Information Exchange Between Modalities
- Existing methods do not fully utilize the relationship between 2D images and 3D point clouds.
- For example:
- 2D images are affected by lighting changes, while 3D LiDAR is more stable.
- However, current methods fail to leverage the complementary nature of both modalities.
Solution
- Cross-Modal Knowledge Distillation (CMKD):
- Simultaneously utilizes 2D and 3D data for learning.
- Creates a hybrid-modal prediction, enabling mutual learning between 2D and 3D models.
- Effect:
- More robust segmentation results, even under lighting variations or density differences.
- Allows each modality to compensate for the weaknesses of the other.
3. Problem: Style Differences Between Source and Target Domains
- Different datasets (e.g., day vs. night, different geographic regions) exhibit major variations in lighting, object appearance, and point cloud density.
- Simple domain alignment methods fail to effectively address these style discrepancies.
Solution
- Multi-Modal Style Transfer (MMST):
- For 2D images: Uses Fast Fourier Transform (FFT) to apply target domain style while preserving content.
- For 3D point clouds: Uses Density Transfer (DT) to match the density of source point clouds to that of the target domain.
- Effect:
- Helps the model naturally adapt to the target domain.
- Enhances the effectiveness of 2D and 3D feature extraction, leading to improved performance.
Experimental Results
The paper evaluates Dual-Cross on multiple domain adaptation settings, showing significant performance gains over previous methods (xMUDA, DsCML).
- Day → Night Adaptation
- +6.1% improvement in 2D, +2.0% in 3D over xMUDA.
- +4.3% in 2D, +0.5% in 3D over DsCML.
- Effectively mitigates misclassification issues under night conditions.
- A2D2 → SemanticKITTI (Dataset-to-Dataset Adaptation)
- +6.2% improvement in 2D, +3.8% in 3D over xMUDA.
- Successfully classifies bicycles and sidewalks, which previous models misclassified.
- USA → Singapore (Country-to-Country Adaptation)
- Slightly outperforms xMUDA but is comparable to DsCML.
- Computational Efficiency
- Dual-Cross achieves better performance with lower computational cost.
- Unlike adversarial learning, training is stable and efficient.
Conclusion
Dual-Cross effectively addresses the challenges in 3D semantic segmentation domain adaptation, outperforming existing methods.
- Cross-Domain Knowledge Distillation (CDKD) → Guides the student model by incorporating target domain information.
- Cross-Modal Knowledge Distillation (CMKD) → Utilizes 2D and 3D modalities to enhance segmentation.
- Multi-Modal Style Transfer (MMST) → Applies style transfer techniques to better align source and target domains.
These innovations result in a more effective and robust 3D segmentation model, especially in challenging environments with lighting variations and sensor differences.
Review
- I reviewed it because it is a paper that uses FFT for 3D Semantic Segmentation, but it seems that there are many cases of applying FFT to 2D in the Domain Adaptation field.
- As the idea I’m thinking about now is to add information in 2D and 3D, I thought there might be an idea that I can get even if I have a DA in KD (and Day => Night would be good to apply in my own field).
- Usually, it is considered to KD between 2D and 3D between 3D, but in this case, it was executed in hybrid and 2D and hybrid and 3D.