AB  - Large Language Models (LLMs) have demonstrated exceptional capabilities across diverse natural language processing (NLP) tasks. The release of open-source LLMs like LLaMA and Qwen has triggered the development of numerous fine-tuned models tailored for various tasks and languages. In this paper, we explore an important question: is it possible to combine these specialized models to create a unified model with multi-task capabilities. We introduces Hierarchical Iterative Merging (Hi-Merging), a training-free method for unifying different specialized LLMs into a single model. Specifically, Hi-Merging employs model-wise and layer-wise pruning and scaling, guided by contribution analysis, to mitigate parameter conflicts. Extensive experiments on multiple-choice and question-answering tasks in both Chinese and English validate Hi-Merging's ability for multi-task learning. The results demonstrate that Hi-Merging consistently outperforms existing merging techniques and surpasses the performance of models fine-tuned on combined datasets in most scenarios. Code is available at Applied-Machine-Learning-Lab/Hi-Merging.
AU  - Fu, Z
AU  - Wu, X
AU  - Wang, Y
AU  - Wang, W
AU  - Ye, S
AU  - Yin, H
AU  - Chang, Y
AU  - Zheng, Y
AU  - Zhao, X
DA  - 2025/01/01
DO  - 10.18653/v1/2025.acl-long.1588
EP  - 33124
JO  - Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
PB  - Association for Computational Linguistics (ACL)
PY  - 2025/01/01
SP  - 33111
TI  - Training-free LLM Merging for Multi-task Learning
VL  - 1
Y1  - 2025/01/01
Y2  - 2026/07/02
ER  -