--- license: apache-2.0 language: - en base_model: - microsoft/Phi-3-mini-4k-instruct pipeline_tag: image-text-to-text --- ## LibMoE: A Library for Comprehensive Benchmarking of Mixture of Experts in Large Language Models ### Introduction Mixture of Experts (MoEs) plays an essential role in the development of more efficient and effective large language models (LLMs). Due to the enormous resource requirements, studying large-scale MoE algorithms remains inaccessible to many researchers. This work introduces LibMoE, a comprehensive and modular framework designed to streamline the research, training, and evaluation of MoE algorithms. Built upon three core principles: (i) modular design, (ii) efficient training, and (iii) comprehensive evaluation, LibMoE makes MoEs in LLMs more accessible to a wider range of researchers by standardizing the training and evaluation pipelines. Using LibMoE, we extensively benchmarked five state-of-the-art MoE algorithms across three different LLMs and 11 datasets under a zero-shot setting. The results show that, despite unique characteristics, all MoE algorithms perform similarly when averaged across a broad range of tasks. With its modular design and extensive evaluation capabilities, we believe LibMoE will be invaluable for researchers striving to make meaningful progress toward the next generation of MoE and LLMs. ### Model and Evaluation Benchmarks We have released five MoE algorithms trained based on `microsoft/Phi-3-mini-4k-instruct` for LLMs and `SigLIP` for vision encoding. These models were trained on the [LLAVA-665K dataset](https://huggingface.co./datasets/liuhaotian/LLaVA-Instruct-150K). We evaluated these state-of-the-art algorithms on 11 benchmarks, examining various aspects of MoE algorithm performance. | Model | MoE Method | AI2D | Text VQA | GQA | Hallusion
Benchmark | MathVista
Validation | MMBenchEN
/ dev | MMMU
Validation | MMStar | POPE | SQA IMG
Full | MME | AVG | |---------------------|---------------------|-------|----------|-------|-------------------------|-------------------------|---------------------|---------------------|--------|--------|------------------|-----------|-------| | SigLIP 224 + Phi3 | SMoE-R | 64.35 | 40.35 | 60.03 | **41.75** | 28.7 | 67.96 | 40.22 | 39.47 | 84.31 | 80.71 | 1,655.81 | 54.78 | | | Cosine-R | 64.6 | **41.98**| 60.74 | 41.43 | 31.3 | 70.61 | 41.22 | 38.5 | 86.33 | 81.49 | 1,759.21 | 55.82 | | | Sigmoid-R | 64.66 | 41.05 | 60.52 | 40.8 | 28.8 | 69.07 | 40.89 | 39.29 | 86.54 | 80.85 | 1,766.03 | 55.25 | | | Hyper-R | **65.12** | 41.67 | 59.88 | 41.32 | 30.3 | 69.33 | 41.44 | 39.86 | 85.4 | 79.03 | 1,752.39 | 55.34 | | | Perturbed Cosine-R | 64.8 | 41.89 | **61.0** | 40.9 | **31.8** | **70.7** | **42.0** | **39.6** | **86.43** | **81.44** | **1,776.54** | **56.06** | ### Run LibMoE We provide detailed instructions for setting up and running experiments in this repository: [https://github.com/Fsoft-AIC/LibMoE](https://github.com/Fsoft-AIC/LibMoE) ### Hardware Resources | Stage | MoE Method | Hardware | |-------------------|----------------------|-----------| | Pre-Training | | 4xA100 | | Pre-FineTuning | | 4xA100 | | VIT | SMoE-R | 6xA100 | | | Cosine-R | 6xA100 | | | Sigmoid-R | 6xA100 | | | Hyper-R | 6xA100 | | | Perturbed Cosine-R | 6xA100 | --- ### Citation Information More details can be found in our paper. If you use LibMoE, please cite it using this BibTeX: ``` ```