pbevan11
/

llama-3-8b-ocr-correction

@@ -1,12 +1,12 @@
 ---
-base_model: meta-llama/Meta-Llama-3.1-8B
 library_name: peft
-license: llama3.1
 tags:
 - axolotl
 - generated_from_trainer
 model-index:
-- name: llama-3.1-8b-ocr-correction
   results: []
 ---
@@ -18,7 +18,7 @@ should probably proofread and complete it, then remove this comment. -->
 axolotl version: `0.4.1`
 ```yaml
-base_model: meta-llama/Meta-Llama-3.1-8B
 model_type: AutoModelForCausalLM
 tokenizer_type: AutoTokenizer
@@ -34,14 +34,14 @@ datasets:
   - path: ft_data/alpaca_data.jsonl
     type: alpaca
 dataset_prepared_path: last_run_prepared
-val_set_size: 0.05
 output_dir: ./qlora-alpaca-out
-hub_model_id: pbevan11/llama-3.1-8b-ocr-correction
 adapter: qlora
 lora_model_dir:
-sequence_len: 8192
 sample_packing: true
 pad_to_sequence_len: true
@@ -51,22 +51,15 @@ lora_dropout: 0.05
 lora_target_linear: true
 lora_fan_in_fan_out:
 lora_target_modules:
-  - gate_proj
-  - down_proj
-  - up_proj
-  - q_proj
-  - v_proj
-  - k_proj
-  - o_proj
 wandb_project: ocr-ft
 wandb_entity: sncds
-wandb_name: llama31
 gradient_accumulation_steps: 4
 micro_batch_size: 2 # was 16
 eval_batch_size: 2 # was 16
-num_epochs: 2
 optimizer: paged_adamw_32bit
 lr_scheduler: cosine
 learning_rate: 0.0002
@@ -103,12 +96,12 @@ special_tokens:
 </details><br>
-[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/sncds/ocr-ft/runs/rotjhntf)
-# llama-3.1-8b-ocr-correction
-This model is a fine-tuned version of [meta-llama/Meta-Llama-3.1-8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 0.1901
 ## Model description
@@ -136,26 +129,30 @@ The following hyperparameters were used during training:
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 10
-- num_epochs: 2
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
-| 0.61          | 0.0331 | 1    | 0.6018          |
-| 0.4379        | 0.2645 | 8    | 0.4256          |
-| 0.2531        | 0.5289 | 16   | 0.2714          |
-| 0.2366        | 0.7934 | 24   | 0.2247          |
-| 0.1839        | 1.0331 | 32   | 0.2053          |
-| 0.1752        | 1.2975 | 40   | 0.1961          |
-| 0.1629        | 1.5620 | 48   | 0.1909          |
-| 0.163         | 1.8264 | 56   | 0.1901          |
 ### Framework versions
 - PEFT 0.11.1
-- Transformers 4.43.2
 - Pytorch 2.1.2+cu118
 - Datasets 2.19.1
 - Tokenizers 0.19.1

 ---
+base_model: meta-llama/Meta-Llama-3-8B
 library_name: peft
+license: llama3
 tags:
 - axolotl
 - generated_from_trainer
 model-index:
+- name: llama-3-8b-ocr-correction
   results: []
 ---
 axolotl version: `0.4.1`
 ```yaml
+base_model: meta-llama/Meta-Llama-3-8B
 model_type: AutoModelForCausalLM
 tokenizer_type: AutoTokenizer
   - path: ft_data/alpaca_data.jsonl
     type: alpaca
 dataset_prepared_path: last_run_prepared
+val_set_size: 0.1
 output_dir: ./qlora-alpaca-out
+hub_model_id: pbevan11/llama-3-8b-ocr-correction
 adapter: qlora
 lora_model_dir:
+sequence_len: 4096
 sample_packing: true
 pad_to_sequence_len: true
 lora_target_linear: true
 lora_fan_in_fan_out:
 lora_target_modules:
 wandb_project: ocr-ft
 wandb_entity: sncds
+wandb_name: test
 gradient_accumulation_steps: 4
 micro_batch_size: 2 # was 16
 eval_batch_size: 2 # was 16
+num_epochs: 3
 optimizer: paged_adamw_32bit
 lr_scheduler: cosine
 learning_rate: 0.0002
 </details><br>
+[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://wandb.ai/sncds/ocr-ft/runs/4fhldwb5)
+# llama-3-8b-ocr-correction
+This model is a fine-tuned version of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) on the None dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.1778
 ## Model description
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: cosine
 - lr_scheduler_warmup_steps: 10
+- num_epochs: 3
 ### Training results
 | Training Loss | Epoch  | Step | Validation Loss |
 |:-------------:|:------:|:----:|:---------------:|
+| 0.5646        | 0.0174 | 1    | 0.6286          |
+| 0.3257        | 0.2609 | 15   | 0.2889          |
+| 0.2285        | 0.5217 | 30   | 0.2171          |
+| 0.1727        | 0.7826 | 45   | 0.1910          |
+| 0.1497        | 1.0174 | 60   | 0.1792          |
+| 0.1545        | 1.2783 | 75   | 0.1758          |
+| 0.1317        | 1.5391 | 90   | 0.1738          |
+| 0.1256        | 1.8    | 105  | 0.1699          |
+| 0.0941        | 2.0348 | 120  | 0.1676          |
+| 0.0723        | 2.2957 | 135  | 0.1783          |
+| 0.07          | 2.5565 | 150  | 0.1779          |
+| 0.073         | 2.8174 | 165  | 0.1778          |
 ### Framework versions
 - PEFT 0.11.1
+- Transformers 4.42.3
 - Pytorch 2.1.2+cu118
 - Datasets 2.19.1
 - Tokenizers 0.19.1

adapter_model.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:befe7ee91cb8ab62450880c1dabf645b053b56d4e5b4cf5a4776e29329224eeb
 size 167934026

 version https://git-lfs.github.com/spec/v1
+oid sha256:5c28303892a6636295f8e3b90fae48da861a566c88260c5f90bfd4f586492399
 size 167934026