Training in progress, step 57
Browse files- README.md +25 -89
- adapter_config.json +4 -4
- adapter_model.bin +1 -1
- adapter_model.safetensors +1 -1
- training_args.bin +1 -1
README.md
CHANGED
@@ -1,12 +1,12 @@
|
|
1 |
---
|
2 |
-
base_model: meta-llama/Meta-Llama-3-8B
|
3 |
library_name: peft
|
4 |
-
license: llama3
|
5 |
tags:
|
6 |
- axolotl
|
7 |
- generated_from_trainer
|
8 |
model-index:
|
9 |
-
- name: llama-3-8b-ocr-correction
|
10 |
results: []
|
11 |
---
|
12 |
|
@@ -18,10 +18,9 @@ should probably proofread and complete it, then remove this comment. -->
|
|
18 |
|
19 |
axolotl version: `0.4.1`
|
20 |
```yaml
|
21 |
-
base_model: meta-llama/Meta-Llama-3-8B
|
22 |
model_type: AutoModelForCausalLM
|
23 |
tokenizer_type: AutoTokenizer
|
24 |
-
is_mistral_derived_model: true
|
25 |
|
26 |
load_in_8bit: false
|
27 |
load_in_4bit: true
|
@@ -35,14 +34,14 @@ datasets:
|
|
35 |
- path: ft_data/alpaca_data.jsonl
|
36 |
type: alpaca
|
37 |
dataset_prepared_path: last_run_prepared
|
38 |
-
val_set_size: 0.
|
39 |
output_dir: ./qlora-alpaca-out
|
40 |
-
hub_model_id: pbevan11/llama-3-8b-ocr-correction
|
41 |
|
42 |
adapter: qlora
|
43 |
lora_model_dir:
|
44 |
|
45 |
-
sequence_len:
|
46 |
sample_packing: true
|
47 |
pad_to_sequence_len: true
|
48 |
|
@@ -62,7 +61,7 @@ lora_target_modules:
|
|
62 |
|
63 |
wandb_project: ocr-ft
|
64 |
wandb_entity: sncds
|
65 |
-
wandb_name:
|
66 |
|
67 |
gradient_accumulation_steps: 4
|
68 |
micro_batch_size: 2 # was 16
|
@@ -104,86 +103,24 @@ special_tokens:
|
|
104 |
|
105 |
</details><br>
|
106 |
|
107 |
-
|
|
|
108 |
|
109 |
-
This model is a
|
110 |
It achieves the following results on the evaluation set:
|
111 |
-
- Loss: 0.
|
112 |
|
113 |
-
##
|
114 |
-
|
115 |
-
First, download the model
|
116 |
-
|
117 |
-
```python
|
118 |
-
from peft import AutoPeftModelForCausalLM
|
119 |
-
from transformers import AutoTokenizer
|
120 |
-
model_id='pbevan11/llama-3-8b-ocr-correction'
|
121 |
-
model = AutoPeftModelForCausalLM.from_pretrained(model_id).cuda()
|
122 |
-
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
123 |
-
tokenizer.pad_token = tokenizer.eos_token
|
124 |
-
```
|
125 |
-
|
126 |
-
Then, construct the prompt template like so:
|
127 |
-
|
128 |
-
```python
|
129 |
-
def prompt(instruction, inp):
|
130 |
-
return f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
|
131 |
-
|
132 |
-
### Instruction:
|
133 |
-
{instruction}
|
134 |
-
|
135 |
-
### Input:
|
136 |
-
{inp}
|
137 |
-
|
138 |
-
### Response:
|
139 |
-
"""
|
140 |
-
|
141 |
-
def prompt_tok(instruction, inp, return_ids=False):
|
142 |
-
_p = prompt(instruction, inp)
|
143 |
-
input_ids = tokenizer(_p, return_tensors="pt", truncation=True).input_ids.cuda()
|
144 |
-
out_ids = model.generate(input_ids=input_ids, max_new_tokens=5000,
|
145 |
-
do_sample=False)
|
146 |
-
ids = out_ids.detach().cpu().numpy()
|
147 |
-
if return_ids: return out_ids
|
148 |
-
|
149 |
-
full_output = tokenizer.batch_decode(ids, skip_special_tokens=True)[0]
|
150 |
-
response_start = full_output.find("### Response:")
|
151 |
-
if response_start != -1:
|
152 |
-
return full_output[response_start + len("### Response:"):]
|
153 |
-
else:
|
154 |
-
return full_output[len(_p):]
|
155 |
-
```
|
156 |
-
|
157 |
-
Finally, you can get predictions like this:
|
158 |
-
|
159 |
-
```python
|
160 |
-
# model inputs
|
161 |
-
instruction = "You are an assistant that takes a piece of text that has been corrupted during OCR digitisation, and produce a corrected version of the same text."
|
162 |
-
inp = "Do Not Kule Oi't hy.er-l'rieed AjijqIi: imac - Analyst (fteuiers) Hcuiers - A | ) | ilf, <;/) in |) nter |iic . conic! deeiilf. l.o sell n lower-|)rieofl wersinn oi its Macintosh cornutor to nttinct ronsnnu-rs already euami'red ot its iPod music jiayo-r untl annoyoil. by sccnrit.y problems ivitJi Willtlows PCs , Piper.iaffray analyst. (Jcne Muster <aid on Tlinrtiday."
|
163 |
-
|
164 |
-
# print prediction
|
165 |
-
out = prompt_tok(instruction, inp)
|
166 |
-
print(out.replace('\\', ' '))
|
167 |
-
```
|
168 |
-
|
169 |
-
This will give you a prediction that looks like this:
|
170 |
-
|
171 |
-
```md
|
172 |
-
"Do Not Rule Out Lower-Priced Mac - Analyst (Reuters) Reuters - Apple Inc. may be considering a lower-priced version of its Macintosh computer to attract consumers already enamored of its iPod music player and annoyed by security problems with Windows PCs, PiperJaffray analyst Gene Munster said on Thursday."
|
173 |
-
```
|
174 |
-
|
175 |
-
Alternatively, you can play with this model on Replicate: [tbc](tbc)
|
176 |
|
|
|
177 |
|
178 |
## Intended uses & limitations
|
179 |
|
180 |
-
|
181 |
-
|
182 |
-
This model was intended to be used to restore historical documents that have been imperfectly digitalised using OCR.
|
183 |
|
184 |
## Training and evaluation data
|
185 |
|
186 |
-
|
187 |
|
188 |
## Training procedure
|
189 |
|
@@ -205,21 +142,20 @@ The following hyperparameters were used during training:
|
|
205 |
|
206 |
| Training Loss | Epoch | Step | Validation Loss |
|
207 |
|:-------------:|:------:|:----:|:---------------:|
|
208 |
-
| 0.
|
209 |
-
| 0.
|
210 |
-
| 0.
|
211 |
-
| 0.
|
212 |
-
| 0.
|
213 |
-
| 0.
|
214 |
-
| 0.
|
215 |
-
| 0.
|
216 |
-
| 0.1413 | 1.9547 | 120 | 0.1742 |
|
217 |
|
218 |
|
219 |
### Framework versions
|
220 |
|
221 |
- PEFT 0.11.1
|
222 |
-
- Transformers 4.
|
223 |
- Pytorch 2.1.2+cu118
|
224 |
- Datasets 2.19.1
|
225 |
- Tokenizers 0.19.1
|
|
|
1 |
---
|
2 |
+
base_model: meta-llama/Meta-Llama-3.1-8B
|
3 |
library_name: peft
|
4 |
+
license: llama3.1
|
5 |
tags:
|
6 |
- axolotl
|
7 |
- generated_from_trainer
|
8 |
model-index:
|
9 |
+
- name: llama-3.1-8b-ocr-correction
|
10 |
results: []
|
11 |
---
|
12 |
|
|
|
18 |
|
19 |
axolotl version: `0.4.1`
|
20 |
```yaml
|
21 |
+
base_model: meta-llama/Meta-Llama-3.1-8B
|
22 |
model_type: AutoModelForCausalLM
|
23 |
tokenizer_type: AutoTokenizer
|
|
|
24 |
|
25 |
load_in_8bit: false
|
26 |
load_in_4bit: true
|
|
|
34 |
- path: ft_data/alpaca_data.jsonl
|
35 |
type: alpaca
|
36 |
dataset_prepared_path: last_run_prepared
|
37 |
+
val_set_size: 0.05
|
38 |
output_dir: ./qlora-alpaca-out
|
39 |
+
hub_model_id: pbevan11/llama-3.1-8b-ocr-correction
|
40 |
|
41 |
adapter: qlora
|
42 |
lora_model_dir:
|
43 |
|
44 |
+
sequence_len: 8192
|
45 |
sample_packing: true
|
46 |
pad_to_sequence_len: true
|
47 |
|
|
|
61 |
|
62 |
wandb_project: ocr-ft
|
63 |
wandb_entity: sncds
|
64 |
+
wandb_name: llama31
|
65 |
|
66 |
gradient_accumulation_steps: 4
|
67 |
micro_batch_size: 2 # was 16
|
|
|
103 |
|
104 |
</details><br>
|
105 |
|
106 |
+
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/sncds/ocr-ft/runs/rotjhntf)
|
107 |
+
# llama-3.1-8b-ocr-correction
|
108 |
|
109 |
+
This model is a fine-tuned version of [meta-llama/Meta-Llama-3.1-8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B) on the None dataset.
|
110 |
It achieves the following results on the evaluation set:
|
111 |
+
- Loss: 0.1901
|
112 |
|
113 |
+
## Model description
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
114 |
|
115 |
+
More information needed
|
116 |
|
117 |
## Intended uses & limitations
|
118 |
|
119 |
+
More information needed
|
|
|
|
|
120 |
|
121 |
## Training and evaluation data
|
122 |
|
123 |
+
More information needed
|
124 |
|
125 |
## Training procedure
|
126 |
|
|
|
142 |
|
143 |
| Training Loss | Epoch | Step | Validation Loss |
|
144 |
|:-------------:|:------:|:----:|:---------------:|
|
145 |
+
| 0.61 | 0.0331 | 1 | 0.6018 |
|
146 |
+
| 0.4379 | 0.2645 | 8 | 0.4256 |
|
147 |
+
| 0.2531 | 0.5289 | 16 | 0.2714 |
|
148 |
+
| 0.2366 | 0.7934 | 24 | 0.2247 |
|
149 |
+
| 0.1839 | 1.0331 | 32 | 0.2053 |
|
150 |
+
| 0.1752 | 1.2975 | 40 | 0.1961 |
|
151 |
+
| 0.1629 | 1.5620 | 48 | 0.1909 |
|
152 |
+
| 0.163 | 1.8264 | 56 | 0.1901 |
|
|
|
153 |
|
154 |
|
155 |
### Framework versions
|
156 |
|
157 |
- PEFT 0.11.1
|
158 |
+
- Transformers 4.43.2
|
159 |
- Pytorch 2.1.2+cu118
|
160 |
- Datasets 2.19.1
|
161 |
- Tokenizers 0.19.1
|
adapter_config.json
CHANGED
@@ -20,12 +20,12 @@
|
|
20 |
"rank_pattern": {},
|
21 |
"revision": null,
|
22 |
"target_modules": [
|
23 |
-
"up_proj",
|
24 |
-
"down_proj",
|
25 |
-
"gate_proj",
|
26 |
-
"k_proj",
|
27 |
"v_proj",
|
28 |
"q_proj",
|
|
|
|
|
|
|
|
|
29 |
"o_proj"
|
30 |
],
|
31 |
"task_type": "CAUSAL_LM",
|
|
|
20 |
"rank_pattern": {},
|
21 |
"revision": null,
|
22 |
"target_modules": [
|
|
|
|
|
|
|
|
|
23 |
"v_proj",
|
24 |
"q_proj",
|
25 |
+
"k_proj",
|
26 |
+
"down_proj",
|
27 |
+
"up_proj",
|
28 |
+
"gate_proj",
|
29 |
"o_proj"
|
30 |
],
|
31 |
"task_type": "CAUSAL_LM",
|
adapter_model.bin
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 167934026
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:befe7ee91cb8ab62450880c1dabf645b053b56d4e5b4cf5a4776e29329224eeb
|
3 |
size 167934026
|
adapter_model.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 167832688
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:094a56bdc5ddb4b0283610f269f8a14fe9b93e86c16ad75b348c378b9c7405f6
|
3 |
size 167832688
|
training_args.bin
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 6072
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:823c026c21ead0a0fcfbdb2b1d26d1596e5af7ebb2cff85f40a3fbb177930914
|
3 |
size 6072
|