📝 Publications

ICLR 2025
sym

REMEDY: Recipe Merging Dynamics in Large Vision-Language Models .
Didi Zhu, Yibing Song, Tao Shen, Ziyu Zhao, Jinluan Yang, Min Zhang, Chao Wu

  • First exploration of the LoRA fusion problem in Multimodal Large Language Models
  • Proposing a dynamic fusion scheme enhances zero-shot generalization capability of MLLMs.
ICML 2024
sym

Model Tailor: Mitigating Catastrophic Forgetting in Multi-modal Large Language Models.
Didi Zhu, Zhongyisun Sun, Zexi Li, Tao Shen, Ke Yan, Shouhong Ding, Chao Wu, Kun Kuang

Project

  • Pioneered the first comprehensive exploration and revelation of catastrophic forgetting in MLLMs such as InstructBLIP and LLaVa.
  • Addressed the issue through an innovative training-free model grafting technique.
KDD 2024
sym

Neural Collapse Anchored Prompt Tuning for Generalizable Vision-Language Models.
Didi Zhu, Zexi Li, Min Zhang, Junkun Yuan, Jiashuo Liu, Kun Kuang, Chao Wu

  • The first exploration of large vision-language models through the lens of neural collapse in deep learning theory.
  • Tackle class imbalance in generalization tasks for large vision-language models by leveraging neural collapse theory.
ICCV 2023
sym

Universal domain adaptation via compressive attention matching.
Didi Zhu, Yinchuan Li, Junkun Yuan, Zexi Li, Kun Kuang, Chao Wu

  • Addressed the issue of inconsistent source-target label spaces in Universal Domain Adaptation directly using self-attention in ViT.
ACM Multimedia 2023
sym

Generalized Universal Domain Adaptation with Generative Flow Networks.
Didi Zhu, Yinchuan Li, Yunfeng Shao, Jianye Hao, Fei Wu, Kun Kuang, Jun Xiao, Chao Wu

  • Introduced a comprehensive problem called Generalized Universal Domain Adaptation, achieving a unification of all Domain Adaptation sub-problems involving label heterogeneity.
  • Implemented an exploration-aware active learning strategy based on Generative Flow Networks to effectively address GUDA.