[review] ADAPT: Attentive Self-Distillation and Dual-Decoder Prediction Fusion for Continual Panoptic Segmentation (ICLR 2025)

February 23, 2025 2 minute read

ADAPT: Attentive Self-Distillation and Dual-Decoder Prediction Fusion for Continual Panoptic Segmentation (ICLR 2025)

This paper addresses the challenges of Continual Panoptic Segmentation (CPS), a task that requires models to perform both semantic and instance segmentation over a continuous stream of learning tasks. Existing methods for CPS suffer from catastrophic forgetting, scalability issues, and inefficient adaptation to new classes.

To tackle these problems, the paper proposes ADAPT, a novel framework that integrates:

Attentive Self-Distillation – an adaptive knowledge distillation strategy that prioritizes important information.
Dual-Decoder Prediction Fusion – a mechanism that balances base and novel class retention without scale inconsistency.
Efficient Adaptation Strategy – freezing the image encoder and pixel decoder while fine-tuning selected transformer decoder components for efficiency and flexibility.

Problems and Solutions

1. Problem: Catastrophic Forgetting in Continual Learning

When learning new tasks incrementally, models often forget previously learned knowledge.
This is due to the absence of prior class labels during new training stages, leading to a bias toward newly learned classes.

Solution

Attentive Self-Distillation: Instead of treating all pixels or instances equally, ADAPT re-weights the importance of each query using the background confidence from the teacher model.
Effect: This reduces forgetting of old classes while ensuring that newly learned tasks do not override past knowledge.

2. Problem: High Computational Cost of Knowledge Distillation

Traditional Knowledge Distillation (KD) requires separate forward passes for teacher and student models, significantly increasing computational overhead.
Methods like CoMFormer use full model fine-tuning, which is computationally expensive and inefficient.

Solution

Shared Forward Passes: ADAPT freezes the image encoder and pixel decoder, allowing a single forward pass to be shared between teacher and student models.
Effect: Reduces training time by 34.8% compared to CoMFormer and eliminates redundant computations.

3. Problem: Loss of Model Flexibility (Plasticity) in Continual Learning

Some approaches (e.g., ECLIPSE) improve retention of old classes by freezing most model weights, but fail to generalize well to new tasks due to restricted model plasticity.

Solution

Selective Fine-Tuning of Transformer Decoder:
- Only fine-tunes the cross-attention layers and feed-forward networks (FFN) in the transformer decoder.
- Keeps self-attention layers fixed to balance between rigidity (old knowledge retention) and plasticity (new knowledge learning).
Effect: Provides strong generalization ability while avoiding overfitting.

4. Problem: Inconsistency in Probability Scores for Base and Novel Classes

In multi-step continual learning, knowledge distillation errors accumulate over time, making base and novel class probabilities inconsistent.
Simple Probability-Level Fusion (PLF) leads to mismatches in scale, reducing accuracy.

Solution

Query-Level Fusion (QLF):
- Instead of merging probability scores directly, QLF uses separate decoders for base and novel classes and fuses their outputs.
- Effect: Avoids probability mismatches and ensures accurate segmentation, outperforming probability-based methods.

Experimental Results

The paper evaluates ADAPT on ADE20K, a standard benchmark for panoptic segmentation.

Improved Performance Across All Metrics:
- ADAPT outperforms CoMFormer and ECLIPSE in both base and novel class segmentation.
- PQ (Panoptic Quality) on base classes: +6.2 points over CoMFormer.
- PQ on novel classes: +9.8 points over CoMFormer.
Faster Training with Less Computational Cost:
- ADAPT requires 2.27 hours for training, 34.8% faster than CoMFormer.
- Uses fewer training iterations (4k vs. 16k in ECLIPSE) while achieving superior performance.
More Efficient Inference:
- Inference is 6.5% more efficient than ECLIPSE while improving performance by 59.4% on novel classes.

Conclusion

ADAPT provides a well-balanced approach for continual panoptic segmentation, addressing the key challenges of catastrophic forgetting, scalability, and computational efficiency.

Attentive Self-Distillation enhances knowledge retention while learning new tasks.
Dual-Decoder Prediction Fusion ensures smooth integration of base and novel classes.
Efficient Adaptation Strategy minimizes computational overhead while maintaining model flexibility.

These innovations position ADAPT as a state-of-the-art solution for continual learning in panoptic segmentation, offering a practical and scalable framework for real-world applications.

Share on

Twitter Facebook LinkedIn

[review] ADAPT: Attentive Self-Distillation and Dual-Decoder Prediction Fusion for Continual Panoptic Segmentation (ICLR 2025)

ADAPT: Attentive Self-Distillation and Dual-Decoder Prediction Fusion for Continual Panoptic Segmentation (ICLR 2025)

Problems and Solutions

1. Problem: Catastrophic Forgetting in Continual Learning

Solution

2. Problem: High Computational Cost of Knowledge Distillation

Solution

3. Problem: Loss of Model Flexibility (Plasticity) in Continual Learning

Solution

4. Problem: Inconsistency in Probability Scores for Base and Novel Classes

Solution

Experimental Results

Conclusion

Share on

You may also enjoy

[review] Mitigate the Gap: Improving Cross-Modal Alignment in CLIP (ICLR 2025)

[review] Cross-Domain and Cross-Modal Knowledge Distillation in Domain Adaptation for 3D Semantic Segmentation (MM 2022)

[review] Focus, Distinguish, and Prompt: Unleashing CLIP for Efficient and Flexible Scene Text Retrieval (MM 2024)

[review] ADAPT: Attentive Self-Distillation and Dual-Decoder Prediction Fusion for Continual Panoptic Segmentation (ICLR 2025)