See axolotl config
axolotl version: 0.4.0
base_model: nlpai-lab/KULLM3
base_model_config: nlpai-lab/KULLM3
model_type: LlamaForCausalLM
tokenizer_type: LlamaTokenizer
is_llama_derived_model: true
hub_model_id: kullm3_finetuning_test_4300QA_10epochs
load_in_8bit: false
load_in_4bit: true
strict: false
datasets:
- path: superiort/multiplechoice-4300
type: alpaca
dataset_prepared_path: last_run_prepared
val_set_size: 0.02
output_dir: ./kullm3_finetuning_test_4300QA_10epochs
adapter: qlora
lora_model_dir:
sequence_len: 4096
sample_packing: false
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_modules:
lora_target_linear: true
lora_fan_in_fan_out:
wandb_project: axolotl
wandb_entity:
wandb_watch:
wandb_run_id:
wandb_log_model:
gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 10
optimizer: paged_adamw_32bit
lr_scheduler: cosine
learning_rate: 0.0002
train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
tf32: false
gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true
warmup_steps: 100
eval_steps: 0.01
save_strategy: epoch
save_steps:
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
bos_token: "<s>"
eos_token: "</s>"
unk_token: "<unk>"
pad_token: "</s>" # EOS와 PAD가 동일
kullm3_finetuning_test_4300QA_10epochs
This model is a fine-tuned version of nlpai-lab/KULLM3 on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.4754
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0002
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- total_eval_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- num_epochs: 10
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
0.4883 | 0.01 | 1 | 0.3229 |
0.4139 | 0.11 | 14 | 0.2783 |
0.3475 | 0.21 | 28 | 0.2473 |
0.3427 | 0.32 | 42 | 0.2353 |
0.303 | 0.43 | 56 | 0.2297 |
0.2902 | 0.53 | 70 | 0.2334 |
0.288 | 0.64 | 84 | 0.2271 |
0.2856 | 0.74 | 98 | 0.2233 |
0.3035 | 0.85 | 112 | 0.2182 |
0.2829 | 0.96 | 126 | 0.2161 |
0.2986 | 1.06 | 140 | 0.2219 |
0.2552 | 1.17 | 154 | 0.2269 |
0.2489 | 1.28 | 168 | 0.2223 |
0.2523 | 1.38 | 182 | 0.2248 |
0.2481 | 1.49 | 196 | 0.2220 |
0.235 | 1.59 | 210 | 0.2209 |
0.2661 | 1.7 | 224 | 0.2165 |
0.2522 | 1.81 | 238 | 0.2231 |
0.2775 | 1.91 | 252 | 0.2190 |
0.1825 | 2.02 | 266 | 0.2228 |
0.1836 | 2.13 | 280 | 0.2331 |
0.1655 | 2.23 | 294 | 0.2378 |
0.1604 | 2.34 | 308 | 0.2376 |
0.1766 | 2.44 | 322 | 0.2356 |
0.1897 | 2.55 | 336 | 0.2344 |
0.1756 | 2.66 | 350 | 0.2375 |
0.1616 | 2.76 | 364 | 0.2387 |
0.1436 | 2.87 | 378 | 0.2371 |
0.166 | 2.98 | 392 | 0.2341 |
0.0828 | 3.08 | 406 | 0.2602 |
0.0893 | 3.19 | 420 | 0.2747 |
0.079 | 3.29 | 434 | 0.2760 |
0.0843 | 3.4 | 448 | 0.2780 |
0.0815 | 3.51 | 462 | 0.2812 |
0.0948 | 3.61 | 476 | 0.2828 |
0.0845 | 3.72 | 490 | 0.2766 |
0.1025 | 3.83 | 504 | 0.2772 |
0.0763 | 3.93 | 518 | 0.2813 |
0.0322 | 4.04 | 532 | 0.3309 |
0.031 | 4.14 | 546 | 0.3221 |
0.028 | 4.25 | 560 | 0.3348 |
0.031 | 4.36 | 574 | 0.3374 |
0.0309 | 4.46 | 588 | 0.3355 |
0.0331 | 4.57 | 602 | 0.3344 |
0.034 | 4.68 | 616 | 0.3384 |
0.0324 | 4.78 | 630 | 0.3420 |
0.0301 | 4.89 | 644 | 0.3350 |
0.0327 | 4.99 | 658 | 0.3387 |
0.0111 | 5.1 | 672 | 0.4010 |
0.0089 | 5.21 | 686 | 0.3917 |
0.0075 | 5.31 | 700 | 0.3925 |
0.0106 | 5.42 | 714 | 0.3911 |
0.0091 | 5.53 | 728 | 0.3937 |
0.0109 | 5.63 | 742 | 0.3985 |
0.009 | 5.74 | 756 | 0.4044 |
0.0095 | 5.84 | 770 | 0.3949 |
0.0075 | 5.95 | 784 | 0.3984 |
0.0036 | 6.06 | 798 | 0.4133 |
0.0031 | 6.16 | 812 | 0.4424 |
0.0026 | 6.27 | 826 | 0.4525 |
0.0034 | 6.38 | 840 | 0.4519 |
0.0019 | 6.48 | 854 | 0.4513 |
0.0018 | 6.59 | 868 | 0.4517 |
0.0023 | 6.69 | 882 | 0.4520 |
0.0016 | 6.8 | 896 | 0.4534 |
0.0018 | 6.91 | 910 | 0.4528 |
0.001 | 7.01 | 924 | 0.4537 |
0.0011 | 7.12 | 938 | 0.4581 |
0.0009 | 7.23 | 952 | 0.4631 |
0.0009 | 7.33 | 966 | 0.4662 |
0.0013 | 7.44 | 980 | 0.4680 |
0.0008 | 7.54 | 994 | 0.4700 |
0.001 | 7.65 | 1008 | 0.4711 |
0.0009 | 7.76 | 1022 | 0.4720 |
0.0011 | 7.86 | 1036 | 0.4727 |
0.0009 | 7.97 | 1050 | 0.4731 |
0.0011 | 8.08 | 1064 | 0.4735 |
0.001 | 8.18 | 1078 | 0.4739 |
0.001 | 8.29 | 1092 | 0.4741 |
0.001 | 8.39 | 1106 | 0.4746 |
0.0011 | 8.5 | 1120 | 0.4744 |
0.0012 | 8.61 | 1134 | 0.4751 |
0.0011 | 8.71 | 1148 | 0.4748 |
0.001 | 8.82 | 1162 | 0.4747 |
0.0009 | 8.93 | 1176 | 0.4754 |
0.0011 | 9.03 | 1190 | 0.4752 |
0.0013 | 9.14 | 1204 | 0.4751 |
0.0009 | 9.24 | 1218 | 0.4749 |
0.001 | 9.35 | 1232 | 0.4750 |
0.0017 | 9.46 | 1246 | 0.4750 |
0.0012 | 9.56 | 1260 | 0.4749 |
0.0008 | 9.67 | 1274 | 0.4747 |
0.0008 | 9.78 | 1288 | 0.4749 |
0.0011 | 9.88 | 1302 | 0.4754 |
Framework versions
- PEFT 0.10.0
- Transformers 4.40.0.dev0
- Pytorch 2.1.2+cu121
- Datasets 2.15.0
- Tokenizers 0.15.2
- Downloads last month
- 2