Edit model card

Visualize in Weights & Biases

meta-llama-Meta-Llama-3.1-8B-Instruct_SFT_E1_D20001

This model is a fine-tuned version of unsloth/meta-llama-3.1-8b-instruct-bnb-4bit on the None dataset.

Model description

The Model is trained only on successful episodes produced by the top 10 models from the clembench benchmark version 0.9 and 1.0. The success was measured in terms of most overall successful episodes across all games.

Place Item
1 gpt-4-0613-t0.0--gpt-4-0613-t0.0
2 claude-v1.3-t0.0--claude-v1.3-t0.0
3 gpt-4-1106-preview-t0.0--gpt-4-1106-preview-t0.0
4 gpt-4-t0.0--gpt-4-t0.0
5 gpt-4-0314-t0.0--gpt-4-0314-t0.0
6 claude-2.1-t0.0--claude-2.1-t0.0
7 gpt-4-t0.0--gpt-3.5-turbo-t0.0
8 claude-2-t0.0--claude-2-t0.0
9 gpt-3.5-turbo-1106-t0.0--gpt-3.5-turbo-1106-t0.0
10 gpt-3.5-turbo-0613-t0.0--gpt-3.5-turbo-0613-t0.0

Intended uses & limitations

More information needed

Training and evaluation data

Traning Data: D20001

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 7331
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.03
  • lr_scheduler_warmup_steps: 5
  • num_epochs: 1

Training results

Framework versions

  • PEFT 0.12.0
  • Transformers 4.44.2
  • Pytorch 2.4.0+cu121
  • Datasets 2.21.0
  • Tokenizers 0.19.1
Downloads last month
58
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Collections including clembench-playpen/meta-llama-Meta-Llama-3.1-8B-Instruct_SFT_E1_D20001