catallama
/

CataLlama-v0.1-Instruct-SFT

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

laurentiubp commited on May 26

Commit

05fc736

•

1 Parent(s): 67a0ba7

Update README.md

Files changed (1) hide show

README.md +26 -11

README.md CHANGED Viewed

@@ -32,6 +32,8 @@ The model shows improved proficiency with the Catalan language.
 The model achieves a loss rate of 0.8528 on the validation dataset after two epochs.
 **Model developers** [Laurentiu Petrea](https://www.linkedin.com/in/laurentiupetrea/) based on Llama-3 from Meta.
@@ -84,7 +86,7 @@ print(outputs[0]["generated_text"][len(prompt):])
 The model was trained **with the same prompt template of Llama-3 Instruct**.
-The model was trained for two epochs on 6x A100 80GB GPUs using DeepSpeed ZeRO State-3 without CPU offloading.
 ### Training hyperparameters
@@ -99,16 +101,29 @@ The following hyperparameters were used during training:
 ### Training results
-| Training Loss | Epoch | Step | Validation Loss  |
-|:-------------:|:-----:|:----:|:----------------:|
-| 1.0186        | 0.22  | 200  | 1.0209           |
-| 0.9588        | 0.43  | 400  | 0.9489           |
-| 0.9111        | 0.65  | 600  | 0.9086           |
-| 0.8971        | 0.86  | 800  | 0.8886           |
-| 0.8002        | 1.22  | 1000  | 0.8989          |
-| 0.8068        | 1.43  | 1200  | 0.8835          |
-| 0.7722        | 1.65  | 1400  | 0.8654          |
-| 0.7805        | 1.86  | 1600  | 0.8528          |
 ## Intended Use

 The model achieves a loss rate of 0.8528 on the validation dataset after two epochs.
+**NOTE:** The model was trained for one epoch, then the `train` split of dataset was shuffled and the model was trained for another epoch
 **Model developers** [Laurentiu Petrea](https://www.linkedin.com/in/laurentiupetrea/) based on Llama-3 from Meta.
 The model was trained **with the same prompt template of Llama-3 Instruct**.
+The model was trained for two epochs on **6x A100 80GB GPUs using DeepSpeed ZeRO** State-3 without CPU offloading.
 ### Training hyperparameters
 ### Training results
+**Epoch 1**
+| Training Loss | Epoch | Step | Validation Loss |
+|:-------------:|:-----:|:----:|:---------------:|
+| 1.0938        | 0.11  | 100  | 1.0779          |
+| 1.0186        | 0.22  | 200  | 1.0209          |
+| 1.0157        | 0.32  | 300  | 0.9808          |
+| 0.9588        | 0.43  | 400  | 0.9489          |
+| 0.9039        | 0.54  | 500  | 0.9244          |
+| 0.9111        | 0.65  | 600  | 0.9086          |
+| 0.8918        | 0.75  | 700  | 0.8961          |
+| 0.8971        | 0.86  | 800  | 0.8886          |
+| 0.8631        | 0.97  | 900  | 0.8846          |
+**Epoch 2**
+| Training Loss | Epoch | Step | Validation Loss |
+|:-------------:|:-----:|:----:|:---------------:|
+| 0.8002        | 0.22  | 200  | 0.8989          |
+| 0.8068        | 0.43  | 400  | 0.8835          |
+| 0.7722        | 0.65  | 600  | 0.8654          |
+| 0.7805        | 0.86  | 800  | 0.8528          |
 ## Intended Use