Will there be a 3b model later?
Will there be a 3b model later?
If you know of a good 3B MoE model, feel free to throw it at us.
If you know of a good 3B MoE model, feel free to throw it at us.
https://huggingface.co./ibm-granite/granite-3.0-3b-a800m-base
https://huggingface.co./ibm-granite/granite-3.0-3b-a800m-base
@Fizzarolli you are summoned
Will there be a 3b model later?
@win10 i'll look into it! although, tbh, i do somewhat feel that 1b total and 7b total covers the range well enough (especially taking quants into account) that im not tooooo sure its worth it
If I'm bored at some point I can throw it on the cooktop, depending on how long the 7B took to do
like, 8 hours on 1xA100 D:
If we cut that roughly in half, 4-5 hours, that ain't much
I'll move this over to our chat lol
Will there be a 3b model later?
@win10 i'll look into it! although, tbh, i do somewhat feel that 1b total and 7b total covers the range well enough (especially taking quants into account) that im not tooooo sure its worth it
This model is very fast, I think it is the fastest moe model trained on 3090 so far (same size)
If it'll train on a 3090, then my 4090 should make easy work of it. I'll have a play!
@inflatebot has done it!!!!! https://huggingface.co./allura-org/MoE-Girl-800MA-3BT
i did done it!