Update

Browse files

Files changed (3) hide show

.gitattributes +1 -0
README.md +71 -0
images/drop-upcycling.png +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+images/drop-upcycling.png filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -2,3 +2,74 @@
 license: apache-2.0
 ---

 license: apache-2.0
 ---
+<h1 align="center">
+    <img alt="Drop-Upcycling" src="images/drop-upcycling.png"></a><br>
+<b>Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization</b><br>
+</h1>
+<p align="center">
+  📄 <a href="https://openreview.net/forum?id=gx1wHnf5Vp">[Paper]</a> |
+  🤗 <a href="https://huggingface.co/collections/llm-jp/drop-upcycling-674dc5be7bbb45e12a476b80">[Hugging Face]</a>
+  📁 <a href="https://gitlab.llm-jp.nii.ac.jp/datasets/llm-jp-corpus-v3">[Dataset]</a>
+  💻 <a href="https://github.com/Taishi-N324/Drop-Upcycling">[Code]</a> |
+  📊 <a href="https://wandb.ai/taishi-nakamura/Drop-Upcycling">[Log]</a>
+</p>
+# Model Index
+We provide model checkpoints for all experiments to ensure reproducibility of the results presented in Tables 1 and 2.
+## Table 1
+|Model|Link|
+|---|---|
+|1 Dense 152M| [Link](https://huggingface.co/llm-jp/Dense-152M) |
+|2 MoE FS 8x152M| [Link](https://huggingface.co/llm-jp/FS-8x152M) |
+|3 MoE BTX 8x152M| [Link](https://huggingface.co/llm-jp/BTX-8x152M) |
+|4 MoE NU 8x152M| [Link](https://huggingface.co/llm-jp/NU-8x152M) |
+|5 MoE RNU (r=0.5) 8x152M| [Link](https://huggingface.co/llm-jp/RNU-0.5-8x152M) |
+|6 MoE DU (r=0.5) 8x152M| [Link](https://huggingface.co/llm-jp/DU-0.5-8x152M) |
+|7 MoE DU (r=1.0) 8x152M| [Link](https://huggingface.co/llm-jp/DU-1.0-8x152M) |
+|8 Dense 1.5B| [Link](https://huggingface.co/llm-jp/Dense-1.5B) |
+|9 MoE FS 8x1.5B| [Link](https://huggingface.co/llm-jp/FS-8x1.5B) |
+|10 MoE BTX 8x1.5B| [Link](https://huggingface.co/llm-jp/BTX-8x1.5B) |
+|11 MoE NU 8x1.5B| [Link](https://huggingface.co/llm-jp/NU-8x1.5B) |
+|12 MoE RNU (r=0.5) 8x1.5B| [Link](https://huggingface.co/llm-jp/RNU-0.5-8x1.5B) |
+|13 MoE DU (r=0.5) 8x1.5B| [Link](https://huggingface.co/llm-jp/DU-0.5-8x1.5B) |
+|14 MoE DU (r=1.0) 8x1.5B| [Link](https://huggingface.co/llm-jp/DU-1.0-8x1.5B) |
+## Table 2
+|Model|Link|
+|---|---|
+|1 Dense 3.7B| [Link](https://huggingface.co/llm-jp/Dense-3.7B) |
+|2 MoE FS 8x3.7B| [Link](https://huggingface.co/llm-jp/FS-8x3.7B) |
+|3 MoE DU (r=0.5) 8x3.7B| [Link](https://huggingface.co/llm-jp/DU-0.5-8x3.7B) |
+|4 Dense 13B| [Link](https://huggingface.co/llm-jp/Dense-13B) |
+|5 Dense 3.7B| [Link](https://huggingface.co/llm-jp/llm-jp-3-3.7b) |
+## BTX Experts
+|Model|Link|
+|---|---|
+|Japanese expert 152M| [Link](https://huggingface.co/llm-jp/Dense-btx-japanese-expert-152M) |
+|English expert 152M| [Link](https://huggingface.co/llm-jp/Dense-btx-english-expert-152M) |
+|Code expert 152M| [Link](https://huggingface.co/llm-jp/Dense-btx-code-expert-152M) |
+|Japanese expert 1.5B| [Link](https://huggingface.co/llm-jp/Dense-btx-japanese-expert-1.5B) |
+|English expert 1.5B| [Link](https://huggingface.co/llm-jp/Dense-btx-english-expert-1.5B) |
+|Code expert 1.5B| [Link](https://huggingface.co/llm-jp/Dense-btx-code-expert-1.5B) |
+## How to cite
+If you find our work helpful, please feel free to cite.
+```
+@inproceedings{
+    nakamura2025dropupcycling,
+    title={Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization},
+    author={Taishi Nakamura and Takuya Akiba and Kazuki Fujii and Yusuke Oda and Rio Yokota and Jun Suzuki},
+    booktitle={The Thirteenth International Conference on Learning Representations},
+    year={2025},
+    url={https://openreview.net/forum?id=gx1wHnf5Vp}
+}
+```

images/drop-upcycling.png ADDED Viewed

Git LFS Details

SHA256: 70bc6c51c93d34116429494b84918b87e98fd63563a3ba604db3f6e2b76edaed
Pointer size: 131 Bytes
Size of remote file: 245 kB