meta-llama-Meta-Llama-3.1-8B-Instruct_SFT_E1_D20001

This model is a fine-tuned version of unsloth/meta-llama-3.1-8b-instruct-bnb-4bit on the None dataset.

Model description

The Model is trained only on successful episodes produced by the top 10 models from the clembench benchmark version 0.9 and 1.0. The success was measured in terms of most overall successful episodes across all games.

Place	Item
1	gpt-4-0613-t0.0--gpt-4-0613-t0.0
2	claude-v1.3-t0.0--claude-v1.3-t0.0
3	gpt-4-1106-preview-t0.0--gpt-4-1106-preview-t0.0
4	gpt-4-t0.0--gpt-4-t0.0
5	gpt-4-0314-t0.0--gpt-4-0314-t0.0
6	claude-2.1-t0.0--claude-2.1-t0.0
7	gpt-4-t0.0--gpt-3.5-turbo-t0.0
8	claude-2-t0.0--claude-2-t0.0
9	gpt-3.5-turbo-1106-t0.0--gpt-3.5-turbo-1106-t0.0
10	gpt-3.5-turbo-0613-t0.0--gpt-3.5-turbo-0613-t0.0

Intended uses & limitations

More information needed

Training and evaluation data

Traning Data: D20001

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 4
eval_batch_size: 8
seed: 7331
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.03
lr_scheduler_warmup_steps: 5
num_epochs: 1

Training results

Framework versions

PEFT 0.12.0
Transformers 4.44.2
Pytorch 2.4.0+cu121
Datasets 2.21.0
Tokenizers 0.19.1

clembench-playpen
/

meta-llama-Meta-Llama-3.1-8B-Instruct_SFT_E1_D20001

meta-llama-Meta-Llama-3.1-8B-Instruct_SFT_E1_D20001

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Collections including clembench-playpen/meta-llama-Meta-Llama-3.1-8B-Instruct_SFT_E1_D20001

Llama-3.2-1B

Llama-3.1-8B

Evaluation results