OrionZheng
/

openmoe-8b-chat

@@ -24,15 +24,15 @@ Currently, three models are released in total: OpenMoE-base, OpenMoE-8B(and its
 We provide all these checkpoints on Huggingface(in pytorch) and Google Cloud Storage(in Jax).
-| Model Name     | Description                      | #Param   |Huggingface | Gin File   |
-|----------------|-------------------------------------------------|----------|-------------|----------  |
-| OpenMoE-base   | A small MoE model for debugging(only go through 128B tokens)         |637M      |[Link](https://huggingface.co/OrionZheng/openmoe-base) |[Link](https://github.com/XueFuzhao/t5x/blob/main/t5x/examples/t5/t5_1_1/examples/openmoe_base.gin)  |
-| OpenLLaMA-base | A dense counter-part of OpenMoE-base            |310M      |[Link](https://huggingface.co/fuzhao/OpenLLaMA_Base) |[Link](https://github.com/XueFuzhao/t5x/blob/main/t5x/examples/t5/t5_1_1/examples/openllama_base.gin)  |
-| OpenMoE-8B-200B   | 8B MoE with comparable FLOPs of a 1.6B LLaMA(No SFT) |8B        |[Link](https://huggingface.co/OrionZheng/openmoe-8b-200B/tree/main) |[Link](https://github.com/XueFuzhao/t5x/blob/main/t5x/examples/t5/t5_1_1/examples/openmoe_large.gin) |
-| OpenMoE-8B-890B   | 8B MoE with comparable FLOPs of a 1.6B LLaMA(No SFT)  |8B        |[Link](https://huggingface.co/OrionZheng/openmoe-8b-890B) |[Link](https://github.com/XueFuzhao/t5x/blob/main/t5x/examples/t5/t5_1_1/examples/openmoe_large_full_lm_stage2.gin) |
-| **OpenMoE-8B-1.1T**   | 8B MoE with comparable FLOPs of a 1.6B LLaMA(No SFT)  |8B        |[Link](https://huggingface.co/OrionZheng/openmoe-8b) |[Link](https://github.com/XueFuzhao/t5x/blob/main/t5x/examples/t5/t5_1_1/examples/openmoe_large_full_lm_stage2.gin) |
-| **OpenMoE-8B-Chat (1.1T+SFT)**   | OpenMoE-8B-1.1T supervised finetuned on the [WildChat GPT-4 Subset](https://huggingface.co/datasets/allenai/WildChat-nontoxic)   |8B        |[Link](https://huggingface.co/OrionZheng/openmoe-8b-chat) |[Link](https://github.com/XueFuzhao/t5x/blob/main/t5x/examples/t5/t5_1_1/examples/openmoe_large_wildchat_sft.gin) |
-| **OpenMoE-34B/32E (200B)**   |  34B MoE with comparable FLOPs of a 7B LLaMA(No SFT)  |34B        |[Link](https://huggingface.co/OrionZheng/openmoe-34b-200B) |[Link](https://github.com/XueFuzhao/t5x/blob/main/t5x/examples/t5/t5_1_1/examples/openmoe_xl.gin) |
 The base models, which were trained using 128 billion tokens, served primarily for debugging purposes. After validating the effectiveness of our model architexture, we did not pursue further training. Consequently, their performance might not be very well, and the checkpoint are not suitable for practical applications.
@@ -78,11 +78,14 @@ sample = model.generate(**inputs, max_new_tokens=32)
 print(tokenizer.decode(sample[0]))
 ```
-If you don't have an GPU on your hand, don't worry! you can still experience our model on Colab(Note: this require a $10 Colab Pro). You can experiment with OpenMoE-8B-Chat on Colab directly by [this](https://colab.research.google.com/drive/1xIfIVafnlCP2XVICmRwkUFK3cwTJYjCY). Besides, we also provide a Colab [tutorial](https://colab.research.google.com/drive/1eIT1rtG7pORRQAYtQoMOAekUg7aZLDdn) demonstrating the jax checkpoint conversion.
 - Running OpenMoE-8B requires ~49GB of memory in float32 or ~23GB in bfloat16. It can be executed on a Colab `CPU High-RAM`(in float32) runtime or an `A100-40GB`(in bfloat16) runtime, both of which require Colab Pro. The float16 precision is not recommended because sometimes it will lead to performance degradation.
 - Runing the OpenMoE-34B requries ~89GB of memory in bfloat16 or ~180GB in float32. To perform inference on multiple devices/offloading model weights to RAM, please refer to the script [here](https://github.com/XueFuzhao/OpenMoE/blob/main/script/inference_on_multi_devices.py).
 - A more detailed env setup script can be found [here](https://github.com/XueFuzhao/OpenMoE/blob/main/env/prepare_env.sh), or if you use docker, you can refer to the dockerfile [here](https://github.com/XueFuzhao/OpenMoE/blob/main/env/openmoe_infer_dockerfile). Note: you don't need t5x and Jax dependency if you are using our [huggingface ckpts](https://huggingface.co/OrionZheng/openmoe-8b-chat) without converting the jax checkpoints.
 ## License

 We provide all these checkpoints on Huggingface(in pytorch) and Google Cloud Storage(in Jax).
+| Model Name     | Description                      | #Param   |Huggingface |
+|----------------|-------------------------------------------------|----------|-------------|
+| OpenMoE-base   | A small MoE model for debugging(only go through 128B tokens)         |637M      |[Link](https://huggingface.co/OrionZheng/openmoe-base) |
+| OpenLLaMA-base | A dense counter-part of OpenMoE-base            |310M      |[Link](https://huggingface.co/fuzhao/OpenLLaMA_Base) |
+| OpenMoE-8B-200B   | 8B MoE with comparable FLOPs of a 1.6B LLaMA(No SFT) |8B        |[Link](https://huggingface.co/OrionZheng/openmoe-8b-200B/tree/main) |
+| OpenMoE-8B-890B   | 8B MoE with comparable FLOPs of a 1.6B LLaMA(No SFT)  |8B        |[Link](https://huggingface.co/OrionZheng/openmoe-8b-890B) |
+| **OpenMoE-8B-1.1T**   | 8B MoE with comparable FLOPs of a 1.6B LLaMA(No SFT)  |8B        |[Link](https://huggingface.co/OrionZheng/openmoe-8b) |
+| **OpenMoE-8B-Chat (1.1T+SFT)**   | OpenMoE-8B-1.1T supervised finetuned on the [WildChat GPT-4 Subset](https://huggingface.co/datasets/allenai/WildChat-nontoxic)   |8B        |[Link](https://huggingface.co/OrionZheng/openmoe-8b-chat) |
+| **OpenMoE-34B/32E (200B)**   |  34B MoE with comparable FLOPs of a 7B LLaMA(No SFT)  |34B        |[Link](https://huggingface.co/OrionZheng/openmoe-34b-200B) |
 The base models, which were trained using 128 billion tokens, served primarily for debugging purposes. After validating the effectiveness of our model architexture, we did not pursue further training. Consequently, their performance might not be very well, and the checkpoint are not suitable for practical applications.
 print(tokenizer.decode(sample[0]))
 ```
+If you don't have GPUs on your hand, don't worry! you can still experience our model on Colab(Note: this require a $10 Colab Pro Plan). You can experiment with OpenMoE-8B-Chat on Colab directly by [this](https://colab.research.google.com/drive/1xIfIVafnlCP2XVICmRwkUFK3cwTJYjCY).
 - Running OpenMoE-8B requires ~49GB of memory in float32 or ~23GB in bfloat16. It can be executed on a Colab `CPU High-RAM`(in float32) runtime or an `A100-40GB`(in bfloat16) runtime, both of which require Colab Pro. The float16 precision is not recommended because sometimes it will lead to performance degradation.
 - Runing the OpenMoE-34B requries ~89GB of memory in bfloat16 or ~180GB in float32. To perform inference on multiple devices/offloading model weights to RAM, please refer to the script [here](https://github.com/XueFuzhao/OpenMoE/blob/main/script/inference_on_multi_devices.py).
 - A more detailed env setup script can be found [here](https://github.com/XueFuzhao/OpenMoE/blob/main/env/prepare_env.sh), or if you use docker, you can refer to the dockerfile [here](https://github.com/XueFuzhao/OpenMoE/blob/main/env/openmoe_infer_dockerfile). Note: you don't need t5x and Jax dependency if you are using our [huggingface ckpts](https://huggingface.co/OrionZheng/openmoe-8b-chat) without converting the jax checkpoints.
+Besides, we also provide a Colab [tutorial](https://colab.research.google.com/drive/1eIT1rtG7pORRQAYtQoMOAekUg7aZLDdn) demonstrating the jax checkpoint conversion.
 ## License