--- license: llama3 library_name: peft tags: - generated_from_trainer base_model: meta-llama/Meta-Llama-3-70B-Instruct model-index: - name: output/llama3-70b results: [] --- [Built with Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl)
See axolotl config axolotl version: `0.4.1` ```yaml base_model: meta-llama/Meta-Llama-3-70B-Instruct model_type: LlamaForCausalLM tokenizer_type: AutoTokenizer load_in_8bit: false load_in_4bit: true strict: false datasets: - path: awilliamson/qbank_conversations type: chat_template chat_template: llama3 field_messages: conversations message_field_role: from message_field_content: value roles: system: - system user: - user assistant: - assistant chat_template: llama3 adapter: qlora lora_r: 128 lora_alpha: 32 lora_modules_to_save: [embed_tokens, lm_head] lora_dropout: 0.05 lora_target_linear: true dataset_prepared_path: last_run_prepared val_set_size: 0.05 output_dir: ./output/llama3-70b sequence_len: 2048 sample_packing: false pad_to_sequence_len: true wandb_project: llama-70b wandb_watch: wandb_run_id: wandb_log_model: gradient_accumulation_steps: 4 micro_batch_size: 2 num_epochs: 3 optimizer: adamw_torch lr_scheduler: cosine learning_rate: 2e-4 train_on_inputs: false group_by_length: false bf16: auto fp16: tf32: false gradient_checkpointing: true gradient_checkpointing_kwargs: use_reentrant: false early_stopping_patience: resume_from_checkpoint: logging_steps: 1 xformers_attention: flash_attention: true warmup_steps: 0 evals_per_epoch: 5 eval_table_size: saves_per_epoch: 1 save_total_limit: 10 save_steps: debug: weight_decay: 0.00 fsdp: - full_shard - auto_wrap fsdp_config: fsdp_limit_all_gathers: true fsdp_sync_module_states: true fsdp_offload_params: true fsdp_use_orig_params: false fsdp_cpu_ram_efficient_loading: true fsdp_auto_wrap_policy: TRANSFORMER_BASED_WRAP fsdp_transformer_layer_cls_to_wrap: LlamaDecoderLayer fsdp_state_dict_type: FULL_STATE_DICT fsdp_sharding_strategy: FULL_SHARD special_tokens: pad_token: "<|end_of_text|>" ```

# output/llama3-70b This model is a fine-tuned version of [meta-llama/Meta-Llama-3-70B-Instruct](https://huggingface.co./meta-llama/Meta-Llama-3-70B-Instruct) on the None dataset. It achieves the following results on the evaluation set: - Loss: 1.5806 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0002 - train_batch_size: 2 - eval_batch_size: 2 - seed: 42 - distributed_type: multi-GPU - num_devices: 4 - gradient_accumulation_steps: 4 - total_train_batch_size: 32 - total_eval_batch_size: 8 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - num_epochs: 3 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:------:|:----:|:---------------:| | 1.6238 | 0.0769 | 1 | 1.6328 | | 1.2354 | 0.2308 | 3 | 1.6006 | | 1.1512 | 0.4615 | 6 | 1.6043 | | 1.1183 | 0.6923 | 9 | 1.5402 | | 1.0818 | 0.9231 | 12 | 1.4909 | | 0.7404 | 1.1538 | 15 | 1.4745 | | 0.6681 | 1.3846 | 18 | 1.5023 | | 0.6163 | 1.6154 | 21 | 1.5385 | | 0.6596 | 1.8462 | 24 | 1.5612 | | 0.5081 | 2.0769 | 27 | 1.5699 | | 0.5118 | 2.3077 | 30 | 1.5786 | | 0.4827 | 2.5385 | 33 | 1.5808 | | 0.4768 | 2.7692 | 36 | 1.5800 | | 0.484 | 3.0 | 39 | 1.5806 | ### Framework versions - PEFT 0.11.1 - Transformers 4.41.1 - Pytorch 2.1.2+cu121 - Datasets 2.19.1 - Tokenizers 0.19.1