🧪 Just Another Model Experiment

This is one of many experimental iterations I'm sharing publicly while I mess around with training parameters and ideas. It's not a "real" release - just me being transparent about my learning process. Feel free to look under the hood, but don't expect anything production-ready!

SmolNemo-12B-FFT-experimental

Mahou-1.5-mistral-nemo-12B-lorablated finetuned on HuggingFaceTB/smoltalk.

This model has erratic behavior and poor performance

Method

SFT with 8x A100 for 0.1 epochs.

This was a full finetune. I think the issues with the model can be chalked up to conflicts with Mistral Instruct and ChatML.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	8.32
IFEval (0-Shot)	33.48
BBH (3-Shot)	6.54
MATH Lvl 5 (4-Shot)	0.23
GPQA (0-shot)	1.34
MuSR (0-shot)	5.92
MMLU-PRO (5-shot)	2.41

Dataset used to train nbeerbower/SmolNemo-12B-FFT-experimental

Evaluation results

strict accuracy on IFEval (0-Shot)
Open LLM Leaderboard

33.480
normalized accuracy on BBH (3-Shot)
Open LLM Leaderboard

6.540
exact match on MATH Lvl 5 (4-Shot)
Open LLM Leaderboard

0.230
acc_norm on GPQA (0-shot)
Open LLM Leaderboard

1.340
acc_norm on MuSR (0-shot)
Open LLM Leaderboard

5.920
accuracy on MMLU-PRO (5-shot)
test set Open LLM Leaderboard

2.410

View on Papers With Code

nbeerbower
/

SmolNemo-12B-FFT-experimental

SmolNemo-12B-FFT-experimental

Method

Open LLM Leaderboard Evaluation Results

Model tree for nbeerbower/SmolNemo-12B-FFT-experimental

Dataset used to train nbeerbower/SmolNemo-12B-FFT-experimental

Evaluation results