llm3br256

This model is a fine-tuned version of meta-llama/Llama-3.2-3B-Instruct on the antrepriz dataset. It achieves the following results on the evaluation set:

Loss: 0.0119

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 32
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 5.0

Training results

Training Loss	Epoch	Step	Validation Loss
0.0434	0.1717	5	0.0371
0.0304	0.3433	10	0.0272
0.0224	0.5150	15	0.0224
0.0193	0.6867	20	0.0202
0.0167	0.8584	25	0.0188
0.0474	1.0300	30	0.0174
0.0161	1.2017	35	0.0167
0.013	1.3734	40	0.0159
0.0134	1.5451	45	0.0151
0.0125	1.7167	50	0.0145
0.0126	1.8884	55	0.0140
0.0112	2.0601	60	0.0137
0.0118	2.2318	65	0.0134
0.0098	2.4034	70	0.0130
0.0124	2.5751	75	0.0128
0.0128	2.7468	80	0.0126
0.0103	2.9185	85	0.0125
0.0112	3.0901	90	0.0122
0.0088	3.2618	95	0.0121
0.0092	3.4335	100	0.0122
0.0087	3.6052	105	0.0120
0.0091	3.7768	110	0.0118
0.0094	3.9485	115	0.0117
0.0088	4.1202	120	0.0118
0.0084	4.2918	125	0.0118
0.0083	4.4635	130	0.0119
0.0077	4.6352	135	0.0119
0.0081	4.8069	140	0.0119
0.0079	4.9785	145	0.0119

Framework versions

PEFT 0.12.0
Transformers 4.46.1
Pytorch 2.4.0+cu121
Datasets 3.1.0
Tokenizers 0.20.3

sizhkhy
/

antrepriz

llm3br256

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for sizhkhy/antrepriz

Evaluation results