End of training
Browse files
README.md
CHANGED
@@ -33,9 +33,9 @@ datasets:
|
|
33 |
|
34 |
dataset_prepared_path: Allama3dataset
|
35 |
val_set_size: 0
|
36 |
-
output_dir: models/Allama370b
|
37 |
-
|
38 |
-
|
39 |
sequence_len: 4096
|
40 |
sample_packing: true
|
41 |
pad_to_sequence_len: true
|
@@ -59,7 +59,7 @@ wandb_log_model:
|
|
59 |
|
60 |
gradient_accumulation_steps: 4
|
61 |
micro_batch_size: 1
|
62 |
-
num_epochs:
|
63 |
optimizer: adamw_torch
|
64 |
lr_scheduler: cosine
|
65 |
learning_rate: 0.0002
|
@@ -71,33 +71,31 @@ fp16:
|
|
71 |
tf32: false
|
72 |
hub_model_id: afrias5/Allama370b
|
73 |
gradient_checkpointing: true
|
|
|
|
|
74 |
local_rank:
|
75 |
logging_steps: 1
|
76 |
xformers_attention:
|
77 |
flash_attention: true
|
78 |
s2_attention:
|
|
|
79 |
warmup_steps: 10
|
80 |
# eval_steps: 300
|
81 |
saves_per_epoch: 1
|
82 |
save_total_limit: 12
|
83 |
debug:
|
84 |
-
deepspeed:
|
85 |
-
weight_decay: 0.0
|
86 |
fsdp:
|
|
|
87 |
fsdp_config:
|
88 |
special_tokens:
|
89 |
pad_token: <|end_of_text|>
|
90 |
-
|
91 |
-
|
92 |
-
|
93 |
-
|
94 |
-
|
95 |
-
|
96 |
```
|
97 |
|
98 |
</details><br>
|
99 |
|
100 |
-
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/afrias5/llama3run/runs/
|
101 |
# Allama370b
|
102 |
|
103 |
This model is a fine-tuned version of [meta-llama/Meta-Llama-3-70B](https://huggingface.co/meta-llama/Meta-Llama-3-70B) on the None dataset.
|
@@ -131,7 +129,7 @@ The following hyperparameters were used during training:
|
|
131 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
132 |
- lr_scheduler_type: cosine
|
133 |
- lr_scheduler_warmup_steps: 10
|
134 |
-
- num_epochs:
|
135 |
|
136 |
### Training results
|
137 |
|
|
|
33 |
|
34 |
dataset_prepared_path: Allama3dataset
|
35 |
val_set_size: 0
|
36 |
+
output_dir: models/Allama370b
|
37 |
+
lora_model_dir: models/Allama370b/checkpoint-36
|
38 |
+
auto_resume_from_checkpoints: true
|
39 |
sequence_len: 4096
|
40 |
sample_packing: true
|
41 |
pad_to_sequence_len: true
|
|
|
59 |
|
60 |
gradient_accumulation_steps: 4
|
61 |
micro_batch_size: 1
|
62 |
+
num_epochs: 8
|
63 |
optimizer: adamw_torch
|
64 |
lr_scheduler: cosine
|
65 |
learning_rate: 0.0002
|
|
|
71 |
tf32: false
|
72 |
hub_model_id: afrias5/Allama370b
|
73 |
gradient_checkpointing: true
|
74 |
+
early_stopping_patience:
|
75 |
+
resume_from_checkpoint:
|
76 |
local_rank:
|
77 |
logging_steps: 1
|
78 |
xformers_attention:
|
79 |
flash_attention: true
|
80 |
s2_attention:
|
81 |
+
logging_steps: 1
|
82 |
warmup_steps: 10
|
83 |
# eval_steps: 300
|
84 |
saves_per_epoch: 1
|
85 |
save_total_limit: 12
|
86 |
debug:
|
87 |
+
deepspeed:
|
88 |
+
weight_decay: 0.0
|
89 |
fsdp:
|
90 |
+
deepspeed: deepspeed_configs/zero3_bf16.json
|
91 |
fsdp_config:
|
92 |
special_tokens:
|
93 |
pad_token: <|end_of_text|>
|
|
|
|
|
|
|
|
|
|
|
|
|
94 |
```
|
95 |
|
96 |
</details><br>
|
97 |
|
98 |
+
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/afrias5/llama3run/runs/9o5mcasc)
|
99 |
# Allama370b
|
100 |
|
101 |
This model is a fine-tuned version of [meta-llama/Meta-Llama-3-70B](https://huggingface.co/meta-llama/Meta-Llama-3-70B) on the None dataset.
|
|
|
129 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
130 |
- lr_scheduler_type: cosine
|
131 |
- lr_scheduler_warmup_steps: 10
|
132 |
+
- num_epochs: 8
|
133 |
|
134 |
### Training results
|
135 |
|