Will there be a 3b model later?

#1
by win10 - opened

Will there be a 3b model later?

Allura org

If you know of a good 3B MoE model, feel free to throw it at us.

If you know of a good 3B MoE model, feel free to throw it at us.

https://huggingface.co./ibm-granite/granite-3.0-3b-a800m-base

Will there be a 3b model later?

@win10 i'll look into it! although, tbh, i do somewhat feel that 1b total and 7b total covers the range well enough (especially taking quants into account) that im not tooooo sure its worth it

Allura org

If I'm bored at some point I can throw it on the cooktop, depending on how long the 7B took to do

Allura org

like, 8 hours on 1xA100 D:

If we cut that roughly in half, 4-5 hours, that ain't much
I'll move this over to our chat lol

Will there be a 3b model later?

@win10 i'll look into it! although, tbh, i do somewhat feel that 1b total and 7b total covers the range well enough (especially taking quants into account) that im not tooooo sure its worth it

This model is very fast, I think it is the fastest moe model trained on 3090 so far (same size)

Allura org

If it'll train on a 3090, then my 4090 should make easy work of it. I'll have a play!

Allura org

i did done it!

Sign up or log in to comment