[review] ADAPT: Attentive Self-Distillation and Dual-Decoder Prediction Fusion for Continual Panoptic Segmentation (ICLR 2025)
ADAPT: Attentive Self-Distillation and Dual-Decoder Prediction Fusion for Continual Panoptic Segmentation (ICLR 2025)Permalink
This paper addresses the challenges of Continual Panoptic Segmentation (CPS), a task that requires models to perform both semantic and instance segmentation over a continuous stream of learning tasks. Existing methods for CPS suffer from catastrophic forgetting, scalability issues, and inefficient adaptation to new classes.
To tackle these problems, the paper proposes ADAPT, a novel framework that integrates:
- Attentive Self-Distillation – an adaptive knowledge distillation strategy that prioritizes important information.
- Dual-Decoder Prediction Fusion – a mechanism that balances base and novel class retention without scale inconsistency.
- Efficient Adaptation Strategy – freezing the image encoder and pixel decoder while fine-tuning selected transformer decoder components for efficiency and flexibility.
Problems and SolutionsPermalink
1. Problem: Catastrophic Forgetting in Continual LearningPermalink
- When learning new tasks incrementally, models often forget previously learned knowledge.
- This is due to the absence of prior class labels during new training stages, leading to a bias toward newly learned classes.
SolutionPermalink
- Attentive Self-Distillation: Instead of treating all pixels or instances equally, ADAPT re-weights the importance of each query using the background confidence from the teacher model.
- Effect: This reduces forgetting of old classes while ensuring that newly learned tasks do not override past knowledge.
2. Problem: High Computational Cost of Knowledge DistillationPermalink
- Traditional Knowledge Distillation (KD) requires separate forward passes for teacher and student models, significantly increasing computational overhead.
- Methods like CoMFormer use full model fine-tuning, which is computationally expensive and inefficient.
SolutionPermalink
- Shared Forward Passes: ADAPT freezes the image encoder and pixel decoder, allowing a single forward pass to be shared between teacher and student models.
- Effect: Reduces training time by 34.8% compared to CoMFormer and eliminates redundant computations.
3. Problem: Loss of Model Flexibility (Plasticity) in Continual LearningPermalink
- Some approaches (e.g., ECLIPSE) improve retention of old classes by freezing most model weights, but fail to generalize well to new tasks due to restricted model plasticity.
SolutionPermalink
- Selective Fine-Tuning of Transformer Decoder:
- Only fine-tunes the cross-attention layers and feed-forward networks (FFN) in the transformer decoder.
- Keeps self-attention layers fixed to balance between rigidity (old knowledge retention) and plasticity (new knowledge learning).
- Effect: Provides strong generalization ability while avoiding overfitting.
4. Problem: Inconsistency in Probability Scores for Base and Novel ClassesPermalink
- In multi-step continual learning, knowledge distillation errors accumulate over time, making base and novel class probabilities inconsistent.
- Simple Probability-Level Fusion (PLF) leads to mismatches in scale, reducing accuracy.
SolutionPermalink
- Query-Level Fusion (QLF):
- Instead of merging probability scores directly, QLF uses separate decoders for base and novel classes and fuses their outputs.
- Effect: Avoids probability mismatches and ensures accurate segmentation, outperforming probability-based methods.
Experimental ResultsPermalink
The paper evaluates ADAPT on ADE20K, a standard benchmark for panoptic segmentation.
- Improved Performance Across All Metrics:
- ADAPT outperforms CoMFormer and ECLIPSE in both base and novel class segmentation.
- PQ (Panoptic Quality) on base classes: +6.2 points over CoMFormer.
- PQ on novel classes: +9.8 points over CoMFormer.
- Faster Training with Less Computational Cost:
- ADAPT requires 2.27 hours for training, 34.8% faster than CoMFormer.
- Uses fewer training iterations (4k vs. 16k in ECLIPSE) while achieving superior performance.
- More Efficient Inference:
- Inference is 6.5% more efficient than ECLIPSE while improving performance by 59.4% on novel classes.
ConclusionPermalink
ADAPT provides a well-balanced approach for continual panoptic segmentation, addressing the key challenges of catastrophic forgetting, scalability, and computational efficiency.
- Attentive Self-Distillation enhances knowledge retention while learning new tasks.
- Dual-Decoder Prediction Fusion ensures smooth integration of base and novel classes.
- Efficient Adaptation Strategy minimizes computational overhead while maintaining model flexibility.
These innovations position ADAPT as a state-of-the-art solution for continual learning in panoptic segmentation, offering a practical and scalable framework for real-world applications.