2 minute read

ADAPT: Attentive Self-Distillation and Dual-Decoder Prediction Fusion for Continual Panoptic Segmentation (ICLR 2025)Permalink

This paper addresses the challenges of Continual Panoptic Segmentation (CPS), a task that requires models to perform both semantic and instance segmentation over a continuous stream of learning tasks. Existing methods for CPS suffer from catastrophic forgetting, scalability issues, and inefficient adaptation to new classes.

To tackle these problems, the paper proposes ADAPT, a novel framework that integrates:

  1. Attentive Self-Distillation – an adaptive knowledge distillation strategy that prioritizes important information.
  2. Dual-Decoder Prediction Fusion – a mechanism that balances base and novel class retention without scale inconsistency.
  3. Efficient Adaptation Strategy – freezing the image encoder and pixel decoder while fine-tuning selected transformer decoder components for efficiency and flexibility.

Problems and SolutionsPermalink

1. Problem: Catastrophic Forgetting in Continual LearningPermalink

  • When learning new tasks incrementally, models often forget previously learned knowledge.
  • This is due to the absence of prior class labels during new training stages, leading to a bias toward newly learned classes.

SolutionPermalink

  • Attentive Self-Distillation: Instead of treating all pixels or instances equally, ADAPT re-weights the importance of each query using the background confidence from the teacher model.
  • Effect: This reduces forgetting of old classes while ensuring that newly learned tasks do not override past knowledge.

2. Problem: High Computational Cost of Knowledge DistillationPermalink

  • Traditional Knowledge Distillation (KD) requires separate forward passes for teacher and student models, significantly increasing computational overhead.
  • Methods like CoMFormer use full model fine-tuning, which is computationally expensive and inefficient.

SolutionPermalink

  • Shared Forward Passes: ADAPT freezes the image encoder and pixel decoder, allowing a single forward pass to be shared between teacher and student models.
  • Effect: Reduces training time by 34.8% compared to CoMFormer and eliminates redundant computations.

3. Problem: Loss of Model Flexibility (Plasticity) in Continual LearningPermalink

  • Some approaches (e.g., ECLIPSE) improve retention of old classes by freezing most model weights, but fail to generalize well to new tasks due to restricted model plasticity.

SolutionPermalink

  • Selective Fine-Tuning of Transformer Decoder:
    • Only fine-tunes the cross-attention layers and feed-forward networks (FFN) in the transformer decoder.
    • Keeps self-attention layers fixed to balance between rigidity (old knowledge retention) and plasticity (new knowledge learning).
  • Effect: Provides strong generalization ability while avoiding overfitting.

4. Problem: Inconsistency in Probability Scores for Base and Novel ClassesPermalink

  • In multi-step continual learning, knowledge distillation errors accumulate over time, making base and novel class probabilities inconsistent.
  • Simple Probability-Level Fusion (PLF) leads to mismatches in scale, reducing accuracy.

SolutionPermalink

  • Query-Level Fusion (QLF):
    • Instead of merging probability scores directly, QLF uses separate decoders for base and novel classes and fuses their outputs.
    • Effect: Avoids probability mismatches and ensures accurate segmentation, outperforming probability-based methods.

Experimental ResultsPermalink

The paper evaluates ADAPT on ADE20K, a standard benchmark for panoptic segmentation.

  • Improved Performance Across All Metrics:
    • ADAPT outperforms CoMFormer and ECLIPSE in both base and novel class segmentation.
    • PQ (Panoptic Quality) on base classes: +6.2 points over CoMFormer.
    • PQ on novel classes: +9.8 points over CoMFormer.
  • Faster Training with Less Computational Cost:
    • ADAPT requires 2.27 hours for training, 34.8% faster than CoMFormer.
    • Uses fewer training iterations (4k vs. 16k in ECLIPSE) while achieving superior performance.
  • More Efficient Inference:
    • Inference is 6.5% more efficient than ECLIPSE while improving performance by 59.4% on novel classes.

ConclusionPermalink

ADAPT provides a well-balanced approach for continual panoptic segmentation, addressing the key challenges of catastrophic forgetting, scalability, and computational efficiency.

  1. Attentive Self-Distillation enhances knowledge retention while learning new tasks.
  2. Dual-Decoder Prediction Fusion ensures smooth integration of base and novel classes.
  3. Efficient Adaptation Strategy minimizes computational overhead while maintaining model flexibility.

These innovations position ADAPT as a state-of-the-art solution for continual learning in panoptic segmentation, offering a practical and scalable framework for real-world applications.