End of training
Browse files- README.md +27 -30
- adapter_model.bin +1 -1
README.md
CHANGED
@@ -1,12 +1,12 @@
|
|
1 |
---
|
2 |
-
base_model: meta-llama/Meta-Llama-3
|
3 |
library_name: peft
|
4 |
-
license: llama3
|
5 |
tags:
|
6 |
- axolotl
|
7 |
- generated_from_trainer
|
8 |
model-index:
|
9 |
-
- name: llama-3
|
10 |
results: []
|
11 |
---
|
12 |
|
@@ -18,7 +18,7 @@ should probably proofread and complete it, then remove this comment. -->
|
|
18 |
|
19 |
axolotl version: `0.4.1`
|
20 |
```yaml
|
21 |
-
base_model: meta-llama/Meta-Llama-3
|
22 |
model_type: AutoModelForCausalLM
|
23 |
tokenizer_type: AutoTokenizer
|
24 |
|
@@ -34,14 +34,14 @@ datasets:
|
|
34 |
- path: ft_data/alpaca_data.jsonl
|
35 |
type: alpaca
|
36 |
dataset_prepared_path: last_run_prepared
|
37 |
-
val_set_size: 0.
|
38 |
output_dir: ./qlora-alpaca-out
|
39 |
-
hub_model_id: pbevan11/llama-3
|
40 |
|
41 |
adapter: qlora
|
42 |
lora_model_dir:
|
43 |
|
44 |
-
sequence_len:
|
45 |
sample_packing: true
|
46 |
pad_to_sequence_len: true
|
47 |
|
@@ -51,22 +51,15 @@ lora_dropout: 0.05
|
|
51 |
lora_target_linear: true
|
52 |
lora_fan_in_fan_out:
|
53 |
lora_target_modules:
|
54 |
-
- gate_proj
|
55 |
-
- down_proj
|
56 |
-
- up_proj
|
57 |
-
- q_proj
|
58 |
-
- v_proj
|
59 |
-
- k_proj
|
60 |
-
- o_proj
|
61 |
|
62 |
wandb_project: ocr-ft
|
63 |
wandb_entity: sncds
|
64 |
-
wandb_name:
|
65 |
|
66 |
gradient_accumulation_steps: 4
|
67 |
micro_batch_size: 2 # was 16
|
68 |
eval_batch_size: 2 # was 16
|
69 |
-
num_epochs:
|
70 |
optimizer: paged_adamw_32bit
|
71 |
lr_scheduler: cosine
|
72 |
learning_rate: 0.0002
|
@@ -103,12 +96,12 @@ special_tokens:
|
|
103 |
|
104 |
</details><br>
|
105 |
|
106 |
-
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/sncds/ocr-ft/runs/
|
107 |
-
# llama-3
|
108 |
|
109 |
-
This model is a fine-tuned version of [meta-llama/Meta-Llama-3
|
110 |
It achieves the following results on the evaluation set:
|
111 |
-
- Loss: 0.
|
112 |
|
113 |
## Model description
|
114 |
|
@@ -136,26 +129,30 @@ The following hyperparameters were used during training:
|
|
136 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
137 |
- lr_scheduler_type: cosine
|
138 |
- lr_scheduler_warmup_steps: 10
|
139 |
-
- num_epochs:
|
140 |
|
141 |
### Training results
|
142 |
|
143 |
| Training Loss | Epoch | Step | Validation Loss |
|
144 |
|:-------------:|:------:|:----:|:---------------:|
|
145 |
-
| 0.
|
146 |
-
| 0.
|
147 |
-
| 0.
|
148 |
-
| 0.
|
149 |
-
| 0.
|
150 |
-
| 0.
|
151 |
-
| 0.
|
152 |
-
| 0.
|
|
|
|
|
|
|
|
|
153 |
|
154 |
|
155 |
### Framework versions
|
156 |
|
157 |
- PEFT 0.11.1
|
158 |
-
- Transformers 4.
|
159 |
- Pytorch 2.1.2+cu118
|
160 |
- Datasets 2.19.1
|
161 |
- Tokenizers 0.19.1
|
|
|
1 |
---
|
2 |
+
base_model: meta-llama/Meta-Llama-3-8B
|
3 |
library_name: peft
|
4 |
+
license: llama3
|
5 |
tags:
|
6 |
- axolotl
|
7 |
- generated_from_trainer
|
8 |
model-index:
|
9 |
+
- name: llama-3-8b-ocr-correction
|
10 |
results: []
|
11 |
---
|
12 |
|
|
|
18 |
|
19 |
axolotl version: `0.4.1`
|
20 |
```yaml
|
21 |
+
base_model: meta-llama/Meta-Llama-3-8B
|
22 |
model_type: AutoModelForCausalLM
|
23 |
tokenizer_type: AutoTokenizer
|
24 |
|
|
|
34 |
- path: ft_data/alpaca_data.jsonl
|
35 |
type: alpaca
|
36 |
dataset_prepared_path: last_run_prepared
|
37 |
+
val_set_size: 0.1
|
38 |
output_dir: ./qlora-alpaca-out
|
39 |
+
hub_model_id: pbevan11/llama-3-8b-ocr-correction
|
40 |
|
41 |
adapter: qlora
|
42 |
lora_model_dir:
|
43 |
|
44 |
+
sequence_len: 4096
|
45 |
sample_packing: true
|
46 |
pad_to_sequence_len: true
|
47 |
|
|
|
51 |
lora_target_linear: true
|
52 |
lora_fan_in_fan_out:
|
53 |
lora_target_modules:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
54 |
|
55 |
wandb_project: ocr-ft
|
56 |
wandb_entity: sncds
|
57 |
+
wandb_name: test
|
58 |
|
59 |
gradient_accumulation_steps: 4
|
60 |
micro_batch_size: 2 # was 16
|
61 |
eval_batch_size: 2 # was 16
|
62 |
+
num_epochs: 3
|
63 |
optimizer: paged_adamw_32bit
|
64 |
lr_scheduler: cosine
|
65 |
learning_rate: 0.0002
|
|
|
96 |
|
97 |
</details><br>
|
98 |
|
99 |
+
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/sncds/ocr-ft/runs/4fhldwb5)
|
100 |
+
# llama-3-8b-ocr-correction
|
101 |
|
102 |
+
This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on the None dataset.
|
103 |
It achieves the following results on the evaluation set:
|
104 |
+
- Loss: 0.1778
|
105 |
|
106 |
## Model description
|
107 |
|
|
|
129 |
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
130 |
- lr_scheduler_type: cosine
|
131 |
- lr_scheduler_warmup_steps: 10
|
132 |
+
- num_epochs: 3
|
133 |
|
134 |
### Training results
|
135 |
|
136 |
| Training Loss | Epoch | Step | Validation Loss |
|
137 |
|:-------------:|:------:|:----:|:---------------:|
|
138 |
+
| 0.5646 | 0.0174 | 1 | 0.6286 |
|
139 |
+
| 0.3257 | 0.2609 | 15 | 0.2889 |
|
140 |
+
| 0.2285 | 0.5217 | 30 | 0.2171 |
|
141 |
+
| 0.1727 | 0.7826 | 45 | 0.1910 |
|
142 |
+
| 0.1497 | 1.0174 | 60 | 0.1792 |
|
143 |
+
| 0.1545 | 1.2783 | 75 | 0.1758 |
|
144 |
+
| 0.1317 | 1.5391 | 90 | 0.1738 |
|
145 |
+
| 0.1256 | 1.8 | 105 | 0.1699 |
|
146 |
+
| 0.0941 | 2.0348 | 120 | 0.1676 |
|
147 |
+
| 0.0723 | 2.2957 | 135 | 0.1783 |
|
148 |
+
| 0.07 | 2.5565 | 150 | 0.1779 |
|
149 |
+
| 0.073 | 2.8174 | 165 | 0.1778 |
|
150 |
|
151 |
|
152 |
### Framework versions
|
153 |
|
154 |
- PEFT 0.11.1
|
155 |
+
- Transformers 4.42.3
|
156 |
- Pytorch 2.1.2+cu118
|
157 |
- Datasets 2.19.1
|
158 |
- Tokenizers 0.19.1
|
adapter_model.bin
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 167934026
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:5c28303892a6636295f8e3b90fae48da861a566c88260c5f90bfd4f586492399
|
3 |
size 167934026
|