|
--- |
|
language: |
|
- en |
|
license: apache-2.0 |
|
tags: |
|
- moe |
|
inference: false |
|
--- |
|
# Model Card for TinyMixtral-x8-Clonebase-7b |
|
This model is based on [TinyLlama-1.1B](https://huggingface.co./TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T), converted to a mistral model, and then placed the clone in mixtral. |
|
**This model was created experimentally for training a small mixtral.** |
|
**Without Train, the performance of this model is the same as TinyLlama.** |
|
|
|
# How it was made |
|
First, since tinyllama is an llama model, I converted it to a mistral model. |
|
|
|
After that, I cloned the FFN part and made it experts. |
|
Since they are all the same tensor, the performance does not change. |
|
All gates have the same value. |
|
|
|
# How To Convert |
|
use colab cpu-high-memory. |
|
This model was created with experts=8, but since it is a clone, you can create as many experts as you like. |
|
|
|
[tinyllama_to_mixtral_clonebase.ipynb](https://huggingface.co./mmnga/TinyMixtral-x8-Clonebase-7b/blob/main/notebook/tinyllama_to_mixtral_clonebase.ipynb) |
|
|
|
# revision |
|
[main TinyLlama-1.1B-intermediate-step-1431k-3T](https://huggingface.co./mmnga/TinyMixtral-x8-Clonebase-7b) |
|
[old TinyLlama-1.1B-intermediate-step-1195k-token-2.5T](https://huggingface.co./mmnga/TinyMixtral-x8-Clonebase-7b/tree/TinyLlama-1.1B-intermediate-step-1195k-token-2.5T) |
|
|
|
# Usage |
|
~~~python |
|
pip install transformers --upgrade |
|
pip install flash_attn bitsandbytes accelerate |
|
~~~ |
|
|
|
~~~python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
import torch |
|
|
|
model_name_or_path = "mmnga/TinyMixtral-x8-Clonebase-7b" |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path) |
|
model = AutoModelForCausalLM.from_pretrained(model_name_or_path, device_map="auto", load_in_8bit=True) |
|
|
|
prompt = "Introducing the recipe for today's dinner." |
|
|
|
with torch.no_grad(): |
|
token_ids = tokenizer.encode(prompt, return_tensors="pt") |
|
output_ids = model.generate( |
|
token_ids.to(model.device), |
|
do_sample=True, |
|
max_new_tokens=128, |
|
repetition_penalty=1.5 |
|
) |
|
output = tokenizer.decode(output_ids[0]) |
|
print(output) |
|
|
|
~~~ |