Possiblity of a 8x7b finetune
Could a finetune of 8x7b mistralai/Mixtral-8x7B-v0.1 and mistralai/Mixtral-8x7B-Instruct-v0.1 be done? I would say they are better base models for making a Japanese LLM, they already seem to have a very limited ability in the language as is and seem very smart in English. Not to mention MoE makes things run much faster, where I can be getting like 0.70 T/s on CPU with a 70B but easily get 3 T/s with a 8x7b MoE model. However, I am not sure if opting for a different vocab size like this model seemingly does will make the knowledge already in the base model null and void. There is also Miqu which many have reported to be good in Japanese, but as this model is a leak so legally questionable, although Mistral AI doesn't seem to mind much, so in the end it's not practical to use not to mention it is only in GGUF with attempted dequantization existing but no real full model
Thank you for your question.
We have already started to develop a model based on Mixtral.
At this time, we cannot provide information on whether or not the model will be released, or a specific timeline if it is to be released.
Thanks, also just as a suggestion for a next model I think training it on examples of grammar break downs or explanations of Japanese grammar would be nice. This is personally the use case I would like to use a Japanese LLM for to basically be a language tutor. I honestly wonder if examples like that could even improve it's general ability perhaps, not sure though