pbevan11 commited on
Commit
1bce1c1
1 Parent(s): 893d95d

End of training

Browse files
Files changed (2) hide show
  1. README.md +27 -30
  2. adapter_model.bin +1 -1
README.md CHANGED
@@ -1,12 +1,12 @@
1
  ---
2
- base_model: meta-llama/Meta-Llama-3.1-8B
3
  library_name: peft
4
- license: llama3.1
5
  tags:
6
  - axolotl
7
  - generated_from_trainer
8
  model-index:
9
- - name: llama-3.1-8b-ocr-correction
10
  results: []
11
  ---
12
 
@@ -18,7 +18,7 @@ should probably proofread and complete it, then remove this comment. -->
18
 
19
  axolotl version: `0.4.1`
20
  ```yaml
21
- base_model: meta-llama/Meta-Llama-3.1-8B
22
  model_type: AutoModelForCausalLM
23
  tokenizer_type: AutoTokenizer
24
 
@@ -34,14 +34,14 @@ datasets:
34
  - path: ft_data/alpaca_data.jsonl
35
  type: alpaca
36
  dataset_prepared_path: last_run_prepared
37
- val_set_size: 0.05
38
  output_dir: ./qlora-alpaca-out
39
- hub_model_id: pbevan11/llama-3.1-8b-ocr-correction
40
 
41
  adapter: qlora
42
  lora_model_dir:
43
 
44
- sequence_len: 8192
45
  sample_packing: true
46
  pad_to_sequence_len: true
47
 
@@ -51,22 +51,15 @@ lora_dropout: 0.05
51
  lora_target_linear: true
52
  lora_fan_in_fan_out:
53
  lora_target_modules:
54
- - gate_proj
55
- - down_proj
56
- - up_proj
57
- - q_proj
58
- - v_proj
59
- - k_proj
60
- - o_proj
61
 
62
  wandb_project: ocr-ft
63
  wandb_entity: sncds
64
- wandb_name: llama31
65
 
66
  gradient_accumulation_steps: 4
67
  micro_batch_size: 2 # was 16
68
  eval_batch_size: 2 # was 16
69
- num_epochs: 2
70
  optimizer: paged_adamw_32bit
71
  lr_scheduler: cosine
72
  learning_rate: 0.0002
@@ -103,12 +96,12 @@ special_tokens:
103
 
104
  </details><br>
105
 
106
- [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/sncds/ocr-ft/runs/rotjhntf)
107
- # llama-3.1-8b-ocr-correction
108
 
109
- This model is a fine-tuned version of [meta-llama/Meta-Llama-3.1-8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B) on the None dataset.
110
  It achieves the following results on the evaluation set:
111
- - Loss: 0.1901
112
 
113
  ## Model description
114
 
@@ -136,26 +129,30 @@ The following hyperparameters were used during training:
136
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
137
  - lr_scheduler_type: cosine
138
  - lr_scheduler_warmup_steps: 10
139
- - num_epochs: 2
140
 
141
  ### Training results
142
 
143
  | Training Loss | Epoch | Step | Validation Loss |
144
  |:-------------:|:------:|:----:|:---------------:|
145
- | 0.61 | 0.0331 | 1 | 0.6018 |
146
- | 0.4379 | 0.2645 | 8 | 0.4256 |
147
- | 0.2531 | 0.5289 | 16 | 0.2714 |
148
- | 0.2366 | 0.7934 | 24 | 0.2247 |
149
- | 0.1839 | 1.0331 | 32 | 0.2053 |
150
- | 0.1752 | 1.2975 | 40 | 0.1961 |
151
- | 0.1629 | 1.5620 | 48 | 0.1909 |
152
- | 0.163 | 1.8264 | 56 | 0.1901 |
 
 
 
 
153
 
154
 
155
  ### Framework versions
156
 
157
  - PEFT 0.11.1
158
- - Transformers 4.43.2
159
  - Pytorch 2.1.2+cu118
160
  - Datasets 2.19.1
161
  - Tokenizers 0.19.1
 
1
  ---
2
+ base_model: meta-llama/Meta-Llama-3-8B
3
  library_name: peft
4
+ license: llama3
5
  tags:
6
  - axolotl
7
  - generated_from_trainer
8
  model-index:
9
+ - name: llama-3-8b-ocr-correction
10
  results: []
11
  ---
12
 
 
18
 
19
  axolotl version: `0.4.1`
20
  ```yaml
21
+ base_model: meta-llama/Meta-Llama-3-8B
22
  model_type: AutoModelForCausalLM
23
  tokenizer_type: AutoTokenizer
24
 
 
34
  - path: ft_data/alpaca_data.jsonl
35
  type: alpaca
36
  dataset_prepared_path: last_run_prepared
37
+ val_set_size: 0.1
38
  output_dir: ./qlora-alpaca-out
39
+ hub_model_id: pbevan11/llama-3-8b-ocr-correction
40
 
41
  adapter: qlora
42
  lora_model_dir:
43
 
44
+ sequence_len: 4096
45
  sample_packing: true
46
  pad_to_sequence_len: true
47
 
 
51
  lora_target_linear: true
52
  lora_fan_in_fan_out:
53
  lora_target_modules:
 
 
 
 
 
 
 
54
 
55
  wandb_project: ocr-ft
56
  wandb_entity: sncds
57
+ wandb_name: test
58
 
59
  gradient_accumulation_steps: 4
60
  micro_batch_size: 2 # was 16
61
  eval_batch_size: 2 # was 16
62
+ num_epochs: 3
63
  optimizer: paged_adamw_32bit
64
  lr_scheduler: cosine
65
  learning_rate: 0.0002
 
96
 
97
  </details><br>
98
 
99
+ [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/sncds/ocr-ft/runs/4fhldwb5)
100
+ # llama-3-8b-ocr-correction
101
 
102
+ This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on the None dataset.
103
  It achieves the following results on the evaluation set:
104
+ - Loss: 0.1778
105
 
106
  ## Model description
107
 
 
129
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
130
  - lr_scheduler_type: cosine
131
  - lr_scheduler_warmup_steps: 10
132
+ - num_epochs: 3
133
 
134
  ### Training results
135
 
136
  | Training Loss | Epoch | Step | Validation Loss |
137
  |:-------------:|:------:|:----:|:---------------:|
138
+ | 0.5646 | 0.0174 | 1 | 0.6286 |
139
+ | 0.3257 | 0.2609 | 15 | 0.2889 |
140
+ | 0.2285 | 0.5217 | 30 | 0.2171 |
141
+ | 0.1727 | 0.7826 | 45 | 0.1910 |
142
+ | 0.1497 | 1.0174 | 60 | 0.1792 |
143
+ | 0.1545 | 1.2783 | 75 | 0.1758 |
144
+ | 0.1317 | 1.5391 | 90 | 0.1738 |
145
+ | 0.1256 | 1.8 | 105 | 0.1699 |
146
+ | 0.0941 | 2.0348 | 120 | 0.1676 |
147
+ | 0.0723 | 2.2957 | 135 | 0.1783 |
148
+ | 0.07 | 2.5565 | 150 | 0.1779 |
149
+ | 0.073 | 2.8174 | 165 | 0.1778 |
150
 
151
 
152
  ### Framework versions
153
 
154
  - PEFT 0.11.1
155
+ - Transformers 4.42.3
156
  - Pytorch 2.1.2+cu118
157
  - Datasets 2.19.1
158
  - Tokenizers 0.19.1
adapter_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:befe7ee91cb8ab62450880c1dabf645b053b56d4e5b4cf5a4776e29329224eeb
3
  size 167934026
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5c28303892a6636295f8e3b90fae48da861a566c88260c5f90bfd4f586492399
3
  size 167934026