metadata
library_name: peft
license: apache-2.0
base_model: Qwen/Qwen2.5-7B-Instruct
tags:
- axolotl
- generated_from_trainer
datasets:
- medalpaca/medical_meadow_medqa
model-index:
- name: lora-qwen-25-7b-instruct
results: []
See axolotl config
axolotl version: 0.6.0
base_model: Qwen/Qwen2.5-7B-Instruct
trust_remote_code: true
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
load_in_8bit:
load_in_4bit:
strict: false
datasets:
- path: medalpaca/medical_meadow_medqa
type: alpaca
dataset_prepared_path:
val_set_size: 0.1
output_dir: ./lora-qwen25
sequence_len: 8192
sample_packing: true
eval_sample_packing: true
pad_to_sequence_len: true
adapter: lora
lora_r: 256
lora_alpha: 128
lora_dropout: 0.05
#lora_target_modules:
# - q_proj
# - v_proj
# - k_proj
# - o_proj
# - gate_proj
# - down_proj
# - up_proj
lora_target_linear: true
wandb_project: lora-qwen-25-7b-instruct
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:
gradient_accumulation_steps: 1
micro_batch_size: 1
num_epochs: 3
optimizer: adamw_torch
lr_scheduler: cosine
learning_rate: 0.00001
train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
tf32: false
gradient_checkpointing: true
logging_steps: 1
xformers_attention:
flash_attention: true
warmup_steps:
eval_steps:
save_steps:
evals_per_epoch: 16
saves_per_epoch: 2
debug:
deepspeed: deepspeed_configs/zero2.json
weight_decay:
fsdp:
fsdp_config:
special_tokens:
hub_model_id: neginashz/lora-qwen-25-7b-instruct
hub_strategy:
early_stopping_patience:
resume_from_checkpoint:
auto_resume_from_checkpoints: true
lora-qwen-25-7b-instruct
This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct on the medalpaca/medical_meadow_medqa dataset. It achieves the following results on the evaluation set:
- Loss: 0.1181
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 1
- eval_batch_size: 1
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- total_train_batch_size: 4
- total_eval_batch_size: 4
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 7
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
2.774 | 0.0741 | 6 | 2.5571 |
1.4649 | 0.1481 | 12 | 1.3144 |
0.649 | 0.2222 | 18 | 0.4603 |
0.1557 | 0.2963 | 24 | 0.1620 |
0.1792 | 0.3704 | 30 | 0.1539 |
0.1432 | 0.4444 | 36 | 0.1422 |
0.1393 | 0.5185 | 42 | 0.1385 |
0.1137 | 0.5926 | 48 | 0.1340 |
0.1246 | 0.6667 | 54 | 0.1317 |
0.1235 | 0.7407 | 60 | 0.1313 |
0.123 | 0.8148 | 66 | 0.1293 |
0.1413 | 0.8889 | 72 | 0.1277 |
0.1338 | 0.9630 | 78 | 0.1268 |
0.1093 | 1.0247 | 84 | 0.1263 |
0.1442 | 1.0988 | 90 | 0.1265 |
0.1127 | 1.1728 | 96 | 0.1244 |
0.137 | 1.2469 | 102 | 0.1231 |
0.1098 | 1.3210 | 108 | 0.1224 |
0.1276 | 1.3951 | 114 | 0.1223 |
0.102 | 1.4691 | 120 | 0.1215 |
0.1208 | 1.5432 | 126 | 0.1217 |
0.1143 | 1.6173 | 132 | 0.1211 |
0.1315 | 1.6914 | 138 | 0.1204 |
0.1166 | 1.7654 | 144 | 0.1200 |
0.1055 | 1.8395 | 150 | 0.1200 |
0.1235 | 1.9136 | 156 | 0.1194 |
0.12 | 1.9877 | 162 | 0.1193 |
0.0982 | 2.0494 | 168 | 0.1193 |
0.1129 | 2.1235 | 174 | 0.1188 |
0.1094 | 2.1975 | 180 | 0.1190 |
0.1216 | 2.2716 | 186 | 0.1191 |
0.1387 | 2.3457 | 192 | 0.1187 |
0.1001 | 2.4198 | 198 | 0.1184 |
0.1031 | 2.4938 | 204 | 0.1185 |
0.0818 | 2.5679 | 210 | 0.1183 |
0.126 | 2.6420 | 216 | 0.1185 |
0.124 | 2.7160 | 222 | 0.1183 |
0.1193 | 2.7901 | 228 | 0.1184 |
0.1082 | 2.8642 | 234 | 0.1183 |
0.1181 | 2.9383 | 240 | 0.1181 |
Framework versions
- PEFT 0.14.0
- Transformers 4.47.0
- Pytorch 2.5.1+cu124
- Datasets 3.1.0
- Tokenizers 0.21.0