laurentiubp
commited on
Commit
•
05fc736
1
Parent(s):
67a0ba7
Update README.md
Browse files
README.md
CHANGED
@@ -32,6 +32,8 @@ The model shows improved proficiency with the Catalan language.
|
|
32 |
|
33 |
The model achieves a loss rate of 0.8528 on the validation dataset after two epochs.
|
34 |
|
|
|
|
|
35 |
|
36 |
**Model developers** [Laurentiu Petrea](https://www.linkedin.com/in/laurentiupetrea/) based on Llama-3 from Meta.
|
37 |
|
@@ -84,7 +86,7 @@ print(outputs[0]["generated_text"][len(prompt):])
|
|
84 |
|
85 |
The model was trained **with the same prompt template of Llama-3 Instruct**.
|
86 |
|
87 |
-
The model was trained for two epochs on 6x A100 80GB GPUs using DeepSpeed ZeRO State-3 without CPU offloading.
|
88 |
|
89 |
### Training hyperparameters
|
90 |
|
@@ -99,16 +101,29 @@ The following hyperparameters were used during training:
|
|
99 |
|
100 |
### Training results
|
101 |
|
102 |
-
|
103 |
-
|
104 |
-
|
|
105 |
-
|
106 |
-
|
|
107 |
-
|
|
108 |
-
|
|
109 |
-
| 0.
|
110 |
-
| 0.
|
111 |
-
| 0.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
112 |
|
113 |
|
114 |
## Intended Use
|
|
|
32 |
|
33 |
The model achieves a loss rate of 0.8528 on the validation dataset after two epochs.
|
34 |
|
35 |
+
**NOTE:** The model was trained for one epoch, then the `train` split of dataset was shuffled and the model was trained for another epoch
|
36 |
+
|
37 |
|
38 |
**Model developers** [Laurentiu Petrea](https://www.linkedin.com/in/laurentiupetrea/) based on Llama-3 from Meta.
|
39 |
|
|
|
86 |
|
87 |
The model was trained **with the same prompt template of Llama-3 Instruct**.
|
88 |
|
89 |
+
The model was trained for two epochs on **6x A100 80GB GPUs using DeepSpeed ZeRO** State-3 without CPU offloading.
|
90 |
|
91 |
### Training hyperparameters
|
92 |
|
|
|
101 |
|
102 |
### Training results
|
103 |
|
104 |
+
**Epoch 1**
|
105 |
+
|
106 |
+
| Training Loss | Epoch | Step | Validation Loss |
|
107 |
+
|:-------------:|:-----:|:----:|:---------------:|
|
108 |
+
| 1.0938 | 0.11 | 100 | 1.0779 |
|
109 |
+
| 1.0186 | 0.22 | 200 | 1.0209 |
|
110 |
+
| 1.0157 | 0.32 | 300 | 0.9808 |
|
111 |
+
| 0.9588 | 0.43 | 400 | 0.9489 |
|
112 |
+
| 0.9039 | 0.54 | 500 | 0.9244 |
|
113 |
+
| 0.9111 | 0.65 | 600 | 0.9086 |
|
114 |
+
| 0.8918 | 0.75 | 700 | 0.8961 |
|
115 |
+
| 0.8971 | 0.86 | 800 | 0.8886 |
|
116 |
+
| 0.8631 | 0.97 | 900 | 0.8846 |
|
117 |
+
|
118 |
+
|
119 |
+
**Epoch 2**
|
120 |
+
|
121 |
+
| Training Loss | Epoch | Step | Validation Loss |
|
122 |
+
|:-------------:|:-----:|:----:|:---------------:|
|
123 |
+
| 0.8002 | 0.22 | 200 | 0.8989 |
|
124 |
+
| 0.8068 | 0.43 | 400 | 0.8835 |
|
125 |
+
| 0.7722 | 0.65 | 600 | 0.8654 |
|
126 |
+
| 0.7805 | 0.86 | 800 | 0.8528 |
|
127 |
|
128 |
|
129 |
## Intended Use
|