deutsche-telekom
/

Llama-3.1-MoE-8x8B-Instruct-raw

Text Generation

🇪🇺 Region: EU

Model card Files Files and versions Community

PhilipMay commited on Aug 23, 2024

Commit

349e1a4

·

verified ·

1 Parent(s): cd2e0ea

Update README.md

Files changed (1) hide show

README.md +4 -0

README.md CHANGED Viewed

@@ -8,6 +8,10 @@ It is based on [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/me
 was created with the help of [mergekit](https://github.com/arcee-ai/mergekit).
 This is the mergekit configuration we used: [mergekit_moe_config.yml](https://huggingface.co/deutsche-telekom/Llama-3.1-MoE-8x8B-Instruct-raw/blob/main/mergekit_moe_config.yml)
 ## Licensing
 This model is licensed under the Llama 3.1 Community License, Copyright (c) 2024 [Philip May](https://philipmay.org), [Deutsche Telekom AG](https://www.telekom.de/)\

 was created with the help of [mergekit](https://github.com/arcee-ai/mergekit).
 This is the mergekit configuration we used: [mergekit_moe_config.yml](https://huggingface.co/deutsche-telekom/Llama-3.1-MoE-8x8B-Instruct-raw/blob/main/mergekit_moe_config.yml)
+It should be noted that this model is the raw model after merging.
+It still has randomly initialized router networks and will not be better than a single one of its expert models.
+This model requires further training before use.
 ## Licensing
 This model is licensed under the Llama 3.1 Community License, Copyright (c) 2024 [Philip May](https://philipmay.org), [Deutsche Telekom AG](https://www.telekom.de/)\