| Functionality-Oriented LLM Merging on the Fisher–Rao Manifold |
2026 |
Arxiv |
Qwen2.5-14B, Qwen2.5-14B-Instruct-1M, Qwen2.5-Coder-14B-Instruct, DeepSeek-R1-Distill-Qwen-14B, OpenReasoning-Nemotron-14B |
| The Appeal and Reality of Recycling LoRAs with Adaptive Merging |
2026 |
Arxiv |
Llama3.1 8B-Instruct |
| LS-Merge: Merging Language Models in Latent Space |
2026 |
ICLR |
Gemma-3-1B-it, Gemma-3-4B-it, Llama-3-1B-instruct, Llama-2-7b |
| Bagging-Based Model Merging for Robust General Text Embeddings |
2026 |
Arxiv |
Qwen3-4B |
| Data-driven Clustering and Merging of Adapters for On-device Large Language Models |
2026 |
Arxiv |
Llama 3.2 3B, Qwen 2.5 1.5B and StableLM 2 1.6B |
| Improving Training Efficiency and Reducing Maintenance Costs via Language Specific Model Merging |
2026 |
Arxiv |
Llama-3.1-8b-Instruct |
| SimMerge: Learning to Select Merge Operators from Similarity Signals |
2026 |
Arxiv |
7B to 111B |
| Multi-Stage Evolutionary Model Merging with Meta Data Driven Curriculum Learning for Sentiment-Specialized Large Language Modeling |
2026 |
Arxiv |
|
| ReasonAny: Incorporating Reasoning Capability to Any Model via Simple and Effective Model Merging |
2026 |
Arxiv |
QwQ-32B-Preview, Meditron3-Qwen2.5-7B and MMed-Llama3-8B, WiroAIFinance-Qwen-7B and WiroAI-Finance-Llama8B |
| Reliable Cultural Knowledge Preservation in Multilingual LLMs through Model Merging |
2025 |
Arxiv |
Qwen-2.5-3B |
| AlignMerge - Alignment-Preserving Large Language Model Merging via Fisher-Guided Geometric Constraints |
2025 |
Arxiv |
LLaMA-3 8B, Mistral 7B, Qwen 2, Phi-3.5, Gemma 2 |
| Grow Up and Merge: Scaling Strategies for Efficient Language Adaptation |
2025 |
Arxiv |
|
| Adapting Chat Language Models Using Only Target Unlabeled Language Data |
2025 |
TMLR |
Qwen2.5 7B, Llama 3.1 8B, Qwen3 14B |
| RCP-Merging: Merging Long Chain-of-Thought Models with Domain-Specific Models by Considering Reasoning Capability as Prior |
2026 |
AAAI |
Qwen2.5-7B, Llama3.1-8B |
| Souper-Model: How Simple Arithmetic Unlocks State-of-the-Art LLM Performance |
2025 |
Arxiv |
xLAM-2-70b, CoALM-70B, watt-tool-70B, functionary-medium-70B, xLAM-2-8b, ToolACE-2-8B, watt-tool-8B, BitAgent-8B, CoALM-8B |
| SPEAR-MM: Selective Parameter Evaluation and Restoration via Model Merging for Efficient Financial LLM Adaptation |
2025 |
Arxiv |
|
| Merging Continual Pretraining Models for Domain-Specialized LLMs: A Case Study in Finance |
2025 |
Arxiv |
Llama-3-8B, Llama-2-7B |
| Extracting and Combining Abilities For Building Multi-lingual Ability-enhanced Large Language Models |
2025 |
EMNLP |
LLaMA-3 8B |
| Bridging Dialectal Gaps in Arabic Medical LLMs through Model Merging |
2025 |
arabicnlp |
|
| Adapting Multilingual Models to Code-Mixed Tasks via Model Merging |
2025 |
Arxiv |
|
| Harmonizing Diverse Models: A Layer-wise Merging Strategy for Consistent Generation |
2025 |
Arxiv |
Llama-3.1-8B-Instruct and Gemma-3-12B-Instruct |
| ABC: Towards a Universal Code Styler through Model Merging |
2025 |
ACM on Programming Languages |
Qwen2.5-Coder, Deepseek-Coder |
| Family Matters: Language Transfer and Merging for Adapting Small LLMs to Faroese |
2025 |
Arxiv |
|
| Expert Merging: Model Merging with Unsupervised Expert Alignment and Importance-Guided Layer Chunking |
2025 |
Arxiv |
Mistral-7B, InternVL, Qwen2-VL |
| The Thinking Spectrum: An Empirical Study of Tunable Reasoning in LLMs through Model Merging |
2025 |
Arxiv |
Qwen3-30B-A3B-Thinking-2507, Qwen3-30B-A3B-Instruct-2507 |
| MLM: Multi-linguistic LoRA Merging 2025 |
NeurIPS WorkShop |
LLaMA-3.2 (1B and 3B) |
|
| Model Merging Scaling Laws in Large Language Models |
2025 |
Arxiv |
Qwen2.5 0.5, 1.5, 3, 7, 14, 32, 72B |
| Harnessing Optimization Dynamics for Curvature-Informed Model Merging |
2025 |
Arxiv |
Llama-3.1-8B |
| Kwai Keye-VL 1.5 Technical Report |
2025 |
Arxiv |
Keye-VL-8B |
| Reasoning Vectors: Transferring Chain-of-Thought Capabilities via Task Arithmetic |
2025 |
Arxiv |
QWEN2.5-7B |
| Surrogate Benchmarks for Model Merging Optimization |
2025 |
Arxiv |
EvoLLM-JP-v1-7B, shisa-gamma-7b-v1 |
| Tensorized Clustered LoRA Merging for Multi-Task Interference |
2025 |
Arxiv |
Mistral-7B |
| Efficient Compositional Multi-tasking for On-device Large Language Models |
2025 |
Arxiv |
Llama 3.1 70B |
| HydraOpt: Navigating the Efficiency-Performance Trade-off of Adapter Merging |
2025 |
Arxiv |
|
| Exploring Sparse Adapters for Scalable Merging of Parameter Efficient Experts |
2025 |
Arxiv |
|
| Merging Large Language Models for Enhanced Code Generation: A Comparative Study of Model Merging Techniques Across Programming Languages |
2025 |
Open Access in DiVA |
CodeQwen1.5-7B, DeepSeek-Coder-6.7b-Base, CodeLlama-34B |
| On Fairness of Task Arithmetic: The Role of Task Vectors |
2025 |
Arxiv |
LLaMA2-7B |
| The Unreasonable Effectiveness of Model Merging for Cross-Lingual Transfer in LLMs |
2025 |
Arxiv |
FALCON 3 7B, QWEN2.5 7B Instruct, LLAMA 3.1 8B Instruct, AYA Expanse 8B |
| Model Merging is Secretly Certifiable: Non-Vacuous Generalisation Bounds for Low-Shot Learning |
2025 |
Arxiv |
MetaMath-Mistral-7B, Dolphin-2.1-Mistral-7B and Speechless-Code-Mistral-7Bv1.0 |
| Training-free LLM Merging for Multi-task Learning |
2025 |
ACL |
Echelon-AI/Med-Qwen2-7B, shtdbb/qwen2-7b-med, Qwen2-Instruct |
| ParamΔ for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost |
2025 |
Arxiv |
Llama3-inst-70B, Llama3-base-70B, Llama3.1-base-70B |
| Beyond ‘Aha!’: Toward Systematic Meta-Abilities Alignment in Large Reasoning Models |
2025 |
Arxiv |
Qwen2.5-7B, Qwen2.5-32B |
| Unified Multi-Task Learning & Model Fusion for Efficient Language Model Guardrailing |
2025 |
Arxiv |
|
| Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging -- An Open Recipe |
2025 |
Arxiv |
Typhoon2 R1 70B, Deepseek R1 70B |
| Efficient Model Development through Fine-tuning Transfer |
2025 |
Arxiv |
Llama 3.1 8B |
| Command A: An Enterprise-Ready Large Language Model |
2025 |
Arxiv |
Command R7B |
| Extrapolation Merging: Keep Improving With Extrapolation and Merging |
2025 |
Arxiv |
Qwen2-7B, Meta-Llama-3-8B, Mistral-Nemo-Base-2407-12B, Qwen1.5-14B |
| Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond |
2025 |
Arxiv |
Light-R1-32B |
| FuseChat-3.0: Preference Optimization Meets Heterogeneous Model Fusion |
2025 |
Arxiv |
Gemma-2-27B-it, Mistral-Large-Instruct-2407, Qwen-2.5-72B-Instruct, and Llama-3.1-70B-Instruct |
| Superficial Self-Improved Reasoners Benefit from Model Merging |
2025 |
Arxiv |
Llama2-7B |
| Nature-Inspired Population-Based Evolution of Large Language Models |
2025 |
Arxiv |
|
| Layer-Aware Task Arithmetic: Disentangling Task-Specific and Instruction-Following Knowledge |
2025 |
Arxiv |
Gemma-2-9B, Llama-3-8B |
| Mixup Model Merge: Enhancing Model Merging Performance through Randomized Linear Interpolation |
2025 |
Arxiv |
WizardLM-13B, WizardMath-13B, llama-2-13b-code-alpaca |
| LoRE-Merging: Exploring Low-Rank Estimation For Large Language Model Merging |
2025 |
Arxiv |
NuminaMath-7B, DeepSeek-Math-7B-Base, LLaMA-series models, WizardMath-13B |
| Merging Language and Domain Specific Models: The Impact on Technical Vocabulary Acquisition |
2025 |
Arxiv |
ContactDoctor-8B |
| Transferring Textual Preferences to Vision-Language Understanding through Model Merging |
2025 |
Arxiv |
Llama-3.2-11B-Vision -Instruct, Llama-3.1-Tulu-2-8B-uf-mean-rm, Llama-3.1-Tulu-3-8B-RM |
| Optimal Brain Iterative Merging: Mitigating Interference in LLM Merging |
2025 |
Arxiv |
Llama-2-13b, WizardMath-13B-V1.0, WizardLM13B-V1.2, llama-2-13b-codealpaca |
| An Open Recipe: Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging |
2025 |
Arxiv |
Typhoon2 70B Instruct, DeepSeek R1 70B Distill, Llama 3.1 70B, Llama 3.3 70B |
| Fine, I’ll Merge It Myself: A Multi-Fidelity Framework for Automated Model Merging |
2025 |
Arxiv |
WizardLM-13B, WizardMath-13B, and llama-2-13b-code-alpaca |
| Skill Expansion and Composition in Parameter Space |
2025 |
Arxiv |
|
| InfiFusion: A Unified Framework for Enhanced Cross-Model Reasoning via LLM Fusion |
2025 |
Arxiv |
Qwen2.5-Coder-14B-Instruct, Qwen2.5-14B-Instruct, and Mistral-Small-24B-Instruct-2501 |
| Channel Merging: Preserving Specialization for Merged Experts |
2025 |
AAAI |
Dolphin-2.2.1-Mistral-7B, Speechless-Code-Mistral-7B, MetaMathMistral-7B, Chinese-Mistral-7BInstruct-v0.1 |
| Weighted-reward preference optimization for implicit model fusion |
2025 |
ICLR |
LLaMA3-8B-Instruct |
| Enhancing Perception Capabilities of Multimodal LLMs with Training-Free Fusion |
2024 |
Arxiv |
MiniGemini-8B and SLIME-8B |
| AgentMerge: Enhancing Generalization in Fine-Tuned LLM Agents |
2024 |
Arxiv |
Llama3.1-8B |
| JRadiEvo: A Japanese Radiology Report Generation Model Enhanced by Evolutionary Optimization of Model Merging |
2024 |
Arxiv |
Bunny-v1_1-Llama-3-8B-V, MMed-Llama-3-8B-EnIns, OpenBioLLM-Llama3-8B, Llama-3-Swallow-8B-Instruct-v0.1 |
| If You Can’t Use Them, Recycle Them: Optimizing Merging at Scale Mitigates Performance Tradeoffs |
2024 |
Arxiv |
Command R+ 104B |
| Agent Skill Acquisition for Large Language Models via CycleQD |
2024 |
Arxiv |
Llama3-8B-Instruct |
| Collaboratively adding new knowledge to an LLM |
2024 |
Arxiv |
Meta-Llama-3-8B |
| Unconstrained Model Merging for Enhanced LLM Reasoning |
2024 |
Arxiv |
CodeLlama-7B-Ins, CodeLlama-70B-Ins, Deepseek-Coder-Ins-v1.5, Qwen2.5-Math-7B-Ins, WizardMath-7B-V1.1, OpenMath-Mistral 7B, MetaMath-7B, MetaMath-70B |
| LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks |
2024 |
Arxiv |
Llama-7b, Llama2-7b-chat |
| Merge to Learn: Efficiently Adding Skills to Language Models with Model Merging |
2024 |
Arxiv |
Llama 2 7B |
| Exploring Model Kinship for Merging Large Language Models |
2024 |
Arxiv |
Mistral-7B, Mistral-7b-instruct-v0.2, MetaMath-mistral-7b, Open-chat-3.5-1210 |
| Merging in a Bottle: Differentiable Adaptive Merging (DAM) and the Path from Averaging to Automation |
2024 |
Arxiv |
shisa-gamma-7b, WizardMath-7B-V1.1, Abel-7B-002, Llama-3-SauerkrautLM-8b-Instruct, Llama-3-Open-Ko-8B, llama-3-sqlcoder-8b, Meta-Llama-3-8B |
| Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models |
2024 |
Arxiv |
LLAMA 3.1 8B |
| What Matters for Model Merging at Scale? |
2024 |
Arxiv |
PaLM-2 (1B, 8B, 24B, 64B), PaLM-2-IT (1B, 8B, 24B, 64B) |
| HM3: Hierarchical Multi-Objective Model Merging for Pretrained Models |
2024 |
Arxiv |
Llama-2-7B-Chat, WizardMath-7B, CodeLlama-7B |
| FUSECHAT: Knowledge Fusion of Chat Models |
2024 |
Arxiv |
OpenChat-3.5-7B, Starling-LM-7B-alpha, NH2-SOLAR-10.7B, InternLM2-Chat-20B, Mixtral-8x7B-Instruct, and Qwen-1.5-Chat-72B |
| SQL-GEN: Bridging the Dialect Gap for Text-to-SQL Via Synthetic Data And Model Merging |
2024 |
Arxiv |
CodeLlama 7B |
| It’s Morphing Time: Unleashing the Potential of Multiple LLMs via Multi-objective Optimization |
2024 |
Arxiv |
Qwen1.5-7B-Chat, Liberated-Qwen1.5-7B, firefly-qwen1.5-en-7B |
| Knowledge Fusion By Evolving Weights of Language Models |
2024 |
ACL |
|
| LLM Merging: Building LLMs Efficiently through Merging |
2024 |
NeurIPS 2024 Competition Track |
LLaMA-7B, Mistral-7B, Gemma-7B |
| Extend Model Merging from Fine-Tuned to Pre-Trained Large Language Models via Weight Disentanglement |
2024 |
Arxiv |
Qwen1.5-7B, Qwen1.5-Chat-7B, Sailor-7B, Qwen1.5-14B, Qwen1.5-Chat-14B, Sailor-14B, WizardLM-13B, WizardMath-13B, llama-2-13b-code-alpaca |
| It’s Morphing Time: Unleashing the Potential of Multiple LLMs via Multi-objective Optimization |
2024 |
Arxiv |
Qwen1.5-7B-Chat, Liberated-Qwen1.5-7B, firefly-qwen1.5-en-7B |
| MetaGPT: Merging Large Language Models Using Model Exclusive Task Arithmetic |
2024 |
Arxiv |
LLaMA-2-7B, Mistral-7B, LLaMA-2-13B |
| PROMETHEUS 2: An Open Source Language Model Specialized in Evaluating Other Language Models |
2024 |
Arxiv |
Mistral-Instruct-7B, Mixtral-Instruct-8x7B |
| Knowledge fusion of large language models |
2024 |
ICLR |
Llama-2 7B, OpenLLaMA 7B, MPT 7B |
| Language models are super mario: Absorbing abilities from homologous models as a free lunch |
2024 |
ICML |
WizardLM-13B, WizardMath-13B, and llama-2-13b-codealpaca, Mistral-7B |
| Controlled Text Generation via Language Model Arithmetic |
2024 |
ICML |
MPT-7B, Pythia-12B, Llama-2-Chat-13B |
| MeteoRA: Multiple-tasks Embedded LoRA for Large Language Models |
2024 |
Arxiv |
LlaMA2-13B and LlaMA3-8B (LoRA) |
| Evolutionary optimization of model merging recipes |
2024 |
Arxiv |
shisa-gamma-7b-v1, WizardMath-7B-V1.1, Arithmo2-Mistral-7B, Abel-7B-002, Mistral-7B-v0.1, LLaVA-1.6-Mistral-7B |
| Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM |
2024 |
Arxiv |
Llama-2-7B |
| Knowledge Fusion of Chat LLMs: A Preliminary Technical Report |
2024 |
Arxiv |
NH2-Mixtral-8x7B, NH2-Solar-10.7B, OpenChat-3.5-7B |