afrias5 commited on
Commit
1a8ed49
1 Parent(s): d15f27a

End of training

Browse files
Files changed (1) hide show
  1. README.md +12 -14
README.md CHANGED
@@ -33,9 +33,9 @@ datasets:
33
 
34
  dataset_prepared_path: Allama3dataset
35
  val_set_size: 0
36
- output_dir: models/Allama370b #change
37
- # lora_model_dir: models/llama370b
38
- # auto_resume_from_checkpoints: true
39
  sequence_len: 4096
40
  sample_packing: true
41
  pad_to_sequence_len: true
@@ -59,7 +59,7 @@ wandb_log_model:
59
 
60
  gradient_accumulation_steps: 4
61
  micro_batch_size: 1
62
- num_epochs: 4
63
  optimizer: adamw_torch
64
  lr_scheduler: cosine
65
  learning_rate: 0.0002
@@ -71,33 +71,31 @@ fp16:
71
  tf32: false
72
  hub_model_id: afrias5/Allama370b
73
  gradient_checkpointing: true
 
 
74
  local_rank:
75
  logging_steps: 1
76
  xformers_attention:
77
  flash_attention: true
78
  s2_attention:
 
79
  warmup_steps: 10
80
  # eval_steps: 300
81
  saves_per_epoch: 1
82
  save_total_limit: 12
83
  debug:
84
- deepspeed: deepspeed_configs/zero3_bf16.json
85
- weight_decay: 0.0
86
  fsdp:
 
87
  fsdp_config:
88
  special_tokens:
89
  pad_token: <|end_of_text|>
90
-
91
-
92
-
93
-
94
-
95
-
96
  ```
97
 
98
  </details><br>
99
 
100
- [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/afrias5/llama3run/runs/0xrylxx1)
101
  # Allama370b
102
 
103
  This model is a fine-tuned version of [meta-llama/Meta-Llama-3-70B](https://huggingface.co/meta-llama/Meta-Llama-3-70B) on the None dataset.
@@ -131,7 +129,7 @@ The following hyperparameters were used during training:
131
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
132
  - lr_scheduler_type: cosine
133
  - lr_scheduler_warmup_steps: 10
134
- - num_epochs: 4
135
 
136
  ### Training results
137
 
 
33
 
34
  dataset_prepared_path: Allama3dataset
35
  val_set_size: 0
36
+ output_dir: models/Allama370b
37
+ lora_model_dir: models/Allama370b/checkpoint-36
38
+ auto_resume_from_checkpoints: true
39
  sequence_len: 4096
40
  sample_packing: true
41
  pad_to_sequence_len: true
 
59
 
60
  gradient_accumulation_steps: 4
61
  micro_batch_size: 1
62
+ num_epochs: 8
63
  optimizer: adamw_torch
64
  lr_scheduler: cosine
65
  learning_rate: 0.0002
 
71
  tf32: false
72
  hub_model_id: afrias5/Allama370b
73
  gradient_checkpointing: true
74
+ early_stopping_patience:
75
+ resume_from_checkpoint:
76
  local_rank:
77
  logging_steps: 1
78
  xformers_attention:
79
  flash_attention: true
80
  s2_attention:
81
+ logging_steps: 1
82
  warmup_steps: 10
83
  # eval_steps: 300
84
  saves_per_epoch: 1
85
  save_total_limit: 12
86
  debug:
87
+ deepspeed:
88
+ weight_decay: 0.0
89
  fsdp:
90
+ deepspeed: deepspeed_configs/zero3_bf16.json
91
  fsdp_config:
92
  special_tokens:
93
  pad_token: <|end_of_text|>
 
 
 
 
 
 
94
  ```
95
 
96
  </details><br>
97
 
98
+ [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/afrias5/llama3run/runs/9o5mcasc)
99
  # Allama370b
100
 
101
  This model is a fine-tuned version of [meta-llama/Meta-Llama-3-70B](https://huggingface.co/meta-llama/Meta-Llama-3-70B) on the None dataset.
 
129
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
130
  - lr_scheduler_type: cosine
131
  - lr_scheduler_warmup_steps: 10
132
+ - num_epochs: 8
133
 
134
  ### Training results
135