Update README.md
Browse files
README.md
CHANGED
@@ -22,7 +22,7 @@ and [tiny-llama-1.1b-chat-medical](https://huggingface.co/SumayyaAli/tiny-llama-
|
|
22 |
|
23 |
OpenOrca experts have been given the task of creating responses for simple questions about things like pop culture, history, and science...step-1195k experts have been chosen to provide warmth and a positive environment, while chat-medical experts have been chosen to provide further detail about human subjects, and to give small little bits of medical advice: I.E. "how do I get rid of this headache I gave myself from making you?"
|
24 |
|
25 |
-
# [What is a Mixture of Experts (MoE)?](https://huggingface.co/blog/moe)
|
26 |
### (from the MistralAI papers...click the quoted question above to navigate to it directly.)
|
27 |
|
28 |
The scale of a model is one of the most important axes for better model quality. Given a fixed computing budget, training a larger model for fewer steps is better than training a smaller model for more steps.
|
|
|
22 |
|
23 |
OpenOrca experts have been given the task of creating responses for simple questions about things like pop culture, history, and science...step-1195k experts have been chosen to provide warmth and a positive environment, while chat-medical experts have been chosen to provide further detail about human subjects, and to give small little bits of medical advice: I.E. "how do I get rid of this headache I gave myself from making you?"
|
24 |
|
25 |
+
# "[What is a Mixture of Experts (MoE)?](https://huggingface.co/blog/moe)"
|
26 |
### (from the MistralAI papers...click the quoted question above to navigate to it directly.)
|
27 |
|
28 |
The scale of a model is one of the most important axes for better model quality. Given a fixed computing budget, training a larger model for fewer steps is better than training a smaller model for more steps.
|