Yova
/

baseline

@@ -13,8 +13,8 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 1.7391
-- Exact Match: 0.0
 ## Model description
@@ -34,37 +34,40 @@ More information needed
 The following hyperparameters were used during training:
 - learning_rate: 0.001
-- train_batch_size: 64
 - eval_batch_size: 8
 - seed: 42
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
-- lr_scheduler_type: linear
 - num_epochs: 20
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss | Exact Match |
 |:-------------:|:-----:|:----:|:---------------:|:-----------:|
-| 2.0707        | 1.0   | 157  | 1.4872          | 0.003       |
-| 1.0627        | 2.0   | 314  | 1.2868          | 0.007       |
-| 0.8185        | 3.0   | 471  | 1.2750          | 0.0         |
-| 0.6815        | 4.0   | 628  | 1.4788          | 0.0         |
-| 0.6021        | 5.0   | 785  | 1.3314          | 0.003       |
-| 0.5331        | 6.0   | 942  | 1.4174          | 0.0         |
-| 0.481         | 7.0   | 1099 | 1.5780          | 0.0         |
-| 0.442         | 8.0   | 1256 | 1.4797          | 0.0         |
-| 0.4027        | 9.0   | 1413 | 1.5298          | 0.0         |
-| 0.3629        | 10.0  | 1570 | 1.5906          | 0.0         |
-| 0.3313        | 11.0  | 1727 | 1.6076          | 0.0         |
-| 0.3031        | 12.0  | 1884 | 1.7169          | 0.0         |
-| 0.269         | 13.0  | 2041 | 1.5874          | 0.0         |
-| 0.2433        | 14.0  | 2198 | 1.7706          | 0.0         |
-| 0.2079        | 15.0  | 2355 | 1.6666          | 0.0         |
-| 0.1772        | 16.0  | 2512 | 1.6823          | 0.0         |
-| 0.1549        | 17.0  | 2669 | 1.7645          | 0.0         |
-| 0.1345        | 18.0  | 2826 | 1.7436          | 0.0         |
-| 0.124         | 19.0  | 2983 | 1.7963          | 0.0         |
-| 0.1166        | 20.0  | 3140 | 1.7391          | 0.0         |
 ### Framework versions

 This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 0.5400
+- Exact Match: 0.21
 ## Model description
 The following hyperparameters were used during training:
 - learning_rate: 0.001
+- train_batch_size: 100
 - eval_batch_size: 8
 - seed: 42
+- gradient_accumulation_steps: 4
+- total_train_batch_size: 400
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: inverse_sqrt
+- lr_scheduler_warmup_steps: 4000
 - num_epochs: 20
 ### Training results
 | Training Loss | Epoch | Step | Validation Loss | Exact Match |
 |:-------------:|:-----:|:----:|:---------------:|:-----------:|
+| 5.8887        | 1.0   | 25   | 5.4437          | 0.0         |
+| 5.136         | 2.0   | 50   | 4.6502          | 0.0         |
+| 4.4735        | 3.0   | 75   | 4.0750          | 0.0         |
+| 4.0514        | 4.0   | 100  | 3.6513          | 0.0         |
+| 3.6137        | 5.0   | 125  | 3.2326          | 0.0         |
+| 3.1753        | 6.0   | 150  | 2.8805          | 0.0         |
+| 2.8265        | 7.0   | 175  | 2.6956          | 0.0         |
+| 2.5627        | 8.0   | 200  | 2.4217          | 0.0         |
+| 2.3479        | 9.0   | 225  | 2.1641          | 0.0         |
+| 2.152         | 10.0  | 250  | 1.9763          | 0.0         |
+| 1.9774        | 11.0  | 275  | 1.7821          | 0.02        |
+| 1.8176        | 12.0  | 300  | 1.6510          | 0.02        |
+| 1.6546        | 13.0  | 325  | 1.4170          | 0.04        |
+| 1.5235        | 14.0  | 350  | 1.3487          | 0.04        |
+| 1.3971        | 15.0  | 375  | 1.1213          | 0.13        |
+| 1.2849        | 16.0  | 400  | 0.9549          | 0.13        |
+| 1.1762        | 17.0  | 425  | 0.8649          | 0.19        |
+| 1.0809        | 18.0  | 450  | 0.7560          | 0.19        |
+| 0.9945        | 19.0  | 475  | 0.6806          | 0.2         |
+| 0.9168        | 20.0  | 500  | 0.5400          | 0.21        |
 ### Framework versions

generation_config.json CHANGED Viewed

@@ -2,6 +2,7 @@
   "decoder_start_token_id": 259,
   "eos_token_id": 1,
   "max_new_tokens": 20,
   "pad_token_id": 0,
   "transformers_version": "4.35.2"
 }

   "decoder_start_token_id": 259,
   "eos_token_id": 1,
   "max_new_tokens": 20,
+  "num_beams": 5,
   "pad_token_id": 0,
   "transformers_version": "4.35.2"
 }