Poor writing quality
Some good new and bad news. Someone will need to confirm my findings with this one, but the writing quality seems to be pretty poor with this model, worse than 1.3 which I found to be worse than 1.2. I double checked to make sure it wasn't an issue on my end by rerunning the same prompts multiple times, and again by testing those same prompts with other models (like 1.2, or spicy abliterated stella, which I know to be good), and it still came out pretty bad.
Here's the good news, I might have interested someone in adding some of your models and merges to their creative writing benchmarking suite: https://www.reddit.com/r/LocalLLaMA/comments/1cd2jco/comment/l6czbx6/
If you get a yi 1.5 34b finetune out, would be cool to see how it fares in that benchmark too.
On an offhand note, I saw that you used daredevil abliterated for the 1.3 base and daredevil mahou. Maybe you would be interested in using the (supposedly) better version that just came out, https://huggingface.co./mlabonne/NeuralDaredevil-8B-abliterated. Maybe a 1.2 + spicy stella merge on top of it as base?
Yeah, I was disappointed with the performance of both 1.3 and 1.4 (at least for llama). I think it's part switching to ChatML tokens and part overfitting the training data.
A rebase and retrain is probably in order. Thanks for your suggestions!
Also exciting to hear about the benchmarking. The more datapoints and feedback the better!
"Rebranding" this as 1.3a :)
I will get to Yi 34B sometime within June btw
Yeah, I was disappointed with the performance of both 1.3 and 1.4 (at least for llama). I think it's part switching to ChatML tokens and part overfitting the training data.
A rebase and retrain is probably in order. Thanks for your suggestions!
Also exciting to hear about the benchmarking. The more datapoints and feedback the better!
I wonder if daredevil as a base just isnt good? I went ahead and tried the recipe I was curious about (neuraldaredevil as base, 1.2 + spicy-abliterated-stella stock merge). It was alright, a little better than 1.3 and 1.4, but not as good as 1.2 or spicy stella, I would say its around as good as daredevil mahou, which I guess is to be expected since it's a similar recipe.
https://huggingface.co./lemon07r/llama-3-NeuralMahou
I have quants I made with the latest llama.cpp, after converting to fp32, but my current net is slow, would take a while to upload. May do it overnight if there's interest. I would like to figure out a good merge recipe using the two that I really like, then maybe train on the creativegpt dataset to see what I get, but I don't know what to try next.
Good to know!
I'm uploading a model stock merge of NeuralDevil, Spicy Stella, Devil-Mahou, and Stheno-Mahou: https://huggingface.co./nbeerbower/llama-3-SNAMD-8B
eta 2 hours?
Then I'll probably merge that with Dolphin (https://huggingface.co./cognitivecomputations/dolphin-2.9-llama3-8b), then do a finetune to try and make it closer to the mistral model before retraining Mahou
Good to know!
I'm uploading a model stock merge of NeuralDevil, Spicy Stella, Devil-Mahou, and Stheno-Mahou: https://huggingface.co./nbeerbower/llama-3-SNAMD-8B
eta 2 hours?Then I'll probably merge that with Dolphin (https://huggingface.co./cognitivecomputations/dolphin-2.9-llama3-8b), then do a finetune to try and make it closer to the mistral model before retraining Mahou
Im testing stheno mahou right now, its pretty decent but not amazing. Maybe because stheno is trained on llama 3 chat format? I tested the model with both formats and it had a much larger vocabulary with llama 3, but chatml had more, or you could say better, creative writing. Hopefully your merge turns out better than mine, but I dont have my hopes too high since, at least with mistral, I found merging too many models usually ends up not as good as just a few good ones (but then again spicy stella turned out very good).