This is the codebase for:
Not All Models Suit Expert Offloading:
On Local Routing Consistency of Mixture-of-Expert Models
Setup a virtual environment with Python 3.13, and run pip install -e requirements.txt
to install dependencies. You will also need scattermoe and smoe to run LLaMA-MoE-v2.
@misc{liang2025modelssuitexpertoffloading,
title={Not All Models Suit Expert Offloading: On Local Routing Consistency of Mixture-of-Expert Models},
author={Jingcong Liang and Siyuan Wang and Miren Tian and Yitong Li and Duyu Tang and Zhongyu Wei},
year={2025},
eprint={2505.16056},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2505.16056},
}