Update README.md
Browse files
README.md
CHANGED
@@ -8,6 +8,10 @@ It is based on [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/me
|
|
8 |
was created with the help of [mergekit](https://github.com/arcee-ai/mergekit).
|
9 |
This is the mergekit configuration we used: [mergekit_moe_config.yml](https://huggingface.co/deutsche-telekom/Llama-3.1-MoE-8x8B-Instruct-raw/blob/main/mergekit_moe_config.yml)
|
10 |
|
|
|
|
|
|
|
|
|
11 |
## Licensing
|
12 |
|
13 |
This model is licensed under the Llama 3.1 Community License, Copyright (c) 2024 [Philip May](https://philipmay.org), [Deutsche Telekom AG](https://www.telekom.de/)\
|
|
|
8 |
was created with the help of [mergekit](https://github.com/arcee-ai/mergekit).
|
9 |
This is the mergekit configuration we used: [mergekit_moe_config.yml](https://huggingface.co/deutsche-telekom/Llama-3.1-MoE-8x8B-Instruct-raw/blob/main/mergekit_moe_config.yml)
|
10 |
|
11 |
+
It should be noted that this model is the raw model after merging.
|
12 |
+
It still has randomly initialized router networks and will not be better than a single one of its expert models.
|
13 |
+
This model requires further training before use.
|
14 |
+
|
15 |
## Licensing
|
16 |
|
17 |
This model is licensed under the Llama 3.1 Community License, Copyright (c) 2024 [Philip May](https://philipmay.org), [Deutsche Telekom AG](https://www.telekom.de/)\
|