See axolotl config

axolotl version: 0.4.1

base_model: NousResearch/Meta-Llama-3-8B
model_type: LlamaForCausalLM
tokenizer_type: AutoTokenizer

load_in_8bit: false
load_in_4bit: false
strict: false

datasets:
  - path: QTeam/htxllama_1
    type: alpaca
dataset_prepared_path:
val_set_size: 0.05
output_dir: ./outputs/llama3-8b-ht-v1-2

sequence_len: 4096
sample_packing: true
eval_sample_packing: false
pad_to_sequence_len: true

adapter: lora
lora_model_dir:
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
lora_modules_to_save:
  - embed_tokens
  - lm_head

wandb_project: "ft-llama3-8b-v1"
wandb_entity: "htxqteam1-htx"
wandb_watch: "all"
wandb_name: 
wandb_log_model: "never"

gradient_accumulation_steps: 1
micro_batch_size: 2
num_epochs: 10
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
s2_attention:

warmup_steps: 10
evals_per_epoch: 4
eval_table_size:
eval_max_new_tokens: 128
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
   pad_token: <|end_of_text|>

outputs/llama3-8b-ht-v1-2

This model is a fine-tuned version of NousResearch/Meta-Llama-3-8B on the None dataset. It achieves the following results on the evaluation set:

Loss: 2.5036

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 2
eval_batch_size: 2
seed: 42
distributed_type: multi-GPU
num_devices: 3
total_train_batch_size: 6
total_eval_batch_size: 6
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 10
num_epochs: 10

Training results

Training Loss	Epoch	Step	Validation Loss
2.1533	0.0161	1	2.1831
1.6416	0.2581	16	1.7333
1.6154	0.5161	32	1.6458
1.5155	0.7742	48	1.5807
1.5359	1.0323	64	1.5371
1.0746	1.2581	80	1.5888
1.0806	1.5161	96	1.5696
1.0348	1.7742	112	1.5536
1.0769	2.0323	128	1.5341
0.6608	2.2581	144	1.6201
0.6918	2.5161	160	1.6185
0.7203	2.7742	176	1.6154
0.7172	3.0323	192	1.6202
0.3914	3.2581	208	1.7162
0.4111	3.5161	224	1.7114
0.4091	3.7742	240	1.7177
0.4103	4.0323	256	1.7191
0.1996	4.2581	272	1.8387
0.1932	4.5161	288	1.8439
0.2185	4.7742	304	1.8510
0.2221	5.0323	320	1.8515
0.0968	5.2581	336	2.0317
0.0937	5.5161	352	2.0138
0.0973	5.7742	368	2.0274
0.083	6.0323	384	2.0257
0.0385	6.2581	400	2.1731
0.0411	6.5161	416	2.2114
0.0446	6.7742	432	2.2080
0.0426	7.0323	448	2.2194
0.0186	7.2581	464	2.4007
0.0186	7.5161	480	2.3837
0.0217	7.7742	496	2.3915
0.0201	8.0323	512	2.3953
0.0137	8.2581	528	2.4732
0.0158	8.5161	544	2.4896
0.0145	8.7742	560	2.4928
0.0145	9.0323	576	2.4964
0.0135	9.2581	592	2.5030
0.0149	9.5161	608	2.5036

Framework versions

PEFT 0.12.0
Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

Qteam1
/

llama3-8b-ht-v1-2

outputs/llama3-8b-ht-v1-2

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for Qteam1/llama3-8b-ht-v1-2

Evaluation results