cc4a3486-a710-49ea-8182-a30d05a62e39

This model is a fine-tuned version of facebook/opt-1.3b on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.000201
train_batch_size: 32
eval_batch_size: 32
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 128
optimizer: Use OptimizerNames.ADAMW_BNB with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
lr_scheduler_warmup_steps: 50
training_steps: 400

Training Loss	Epoch	Step	Validation Loss
No log	0.0028	1	1.0241
0.52	0.1379	50	0.1036
0.5061	0.2759	100	0.1104
0.1155	0.4138	150	0.0543
0.3057	0.5517	200	0.0534
0.0765	0.6897	250	0.0592
0.3329	0.8276	300	0.0546
0.0479	0.9655	350	0.0536