Yova
/

baseline

@@ -13,8 +13,8 @@ should probably proofread and complete it, then remove this comment. -->
 This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
 It achieves the following results on the evaluation set:
-- Loss: 1.2583
-- Exact Match: 0.0
 ## Model description
@@ -34,40 +34,20 @@ More information needed
 The following hyperparameters were used during training:
 - learning_rate: 0.001
-- train_batch_size: 100
 - eval_batch_size: 8
 - seed: 42
-- gradient_accumulation_steps: 4
-- total_train_batch_size: 400
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: inverse_sqrt
 - lr_scheduler_warmup_steps: 4000
-- num_epochs: 20
 ### Training results
-| Training Loss | Epoch | Step | Validation Loss | Exact Match |
-|:-------------:|:-----:|:----:|:---------------:|:-----------:|
-| 6.6009        | 1.0   | 25   | 5.6681          | 0.0         |
-| 5.9582        | 2.0   | 50   | 4.4844          | 0.0         |
-| 5.0101        | 3.0   | 75   | 3.7434          | 0.0         |
-| 4.3172        | 4.0   | 100  | 3.4055          | 0.0         |
-| 3.9672        | 5.0   | 125  | 3.2224          | 0.0         |
-| 3.7659        | 6.0   | 150  | 3.0110          | 0.0         |
-| 3.5416        | 7.0   | 175  | 2.7741          | 0.0         |
-| 3.3002        | 8.0   | 200  | 2.4967          | 0.0         |
-| 3.0412        | 9.0   | 225  | 2.2354          | 0.0         |
-| 2.8117        | 10.0  | 250  | 2.1391          | 0.0         |
-| 2.6168        | 11.0  | 275  | 2.0822          | 0.0         |
-| 2.4624        | 12.0  | 300  | 2.0147          | 0.0         |
-| 2.3253        | 13.0  | 325  | 1.9378          | 0.0         |
-| 2.2152        | 14.0  | 350  | 1.8335          | 0.0         |
-| 2.11          | 15.0  | 375  | 1.7586          | 0.0         |
-| 2.0029        | 16.0  | 400  | 1.6847          | 0.0         |
-| 1.9103        | 17.0  | 425  | 1.5874          | 0.0         |
-| 1.8166        | 18.0  | 450  | 1.5552          | 0.0         |
-| 1.7264        | 19.0  | 475  | 1.3748          | 0.0         |
-| 1.6562        | 20.0  | 500  | 1.2583          | 0.0         |
 ### Framework versions

 This model is a fine-tuned version of [](https://huggingface.co/) on an unknown dataset.
 It achieves the following results on the evaluation set:
+- Loss: 1.2427
+- Exact Match: 0.31
 ## Model description
 The following hyperparameters were used during training:
 - learning_rate: 0.001
+- train_batch_size: 400
 - eval_batch_size: 8
 - seed: 42
 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
 - lr_scheduler_type: inverse_sqrt
 - lr_scheduler_warmup_steps: 4000
+- training_steps: 400
+- label_smoothing_factor: 0.1
 ### Training results
+| Training Loss | Epoch  | Step | Validation Loss | Exact Match |
+|:-------------:|:------:|:----:|:---------------:|:-----------:|
+| 2.9226        | 133.33 | 400  | 1.2427          | 0.31        |
 ### Framework versions

generation_config.json CHANGED Viewed

@@ -1,7 +1,9 @@
 {
-  "decoder_start_token_id": 259,
-  "eos_token_id": 1,
-  "max_new_tokens": 20,
   "num_beams": 5,
   "pad_token_id": 0,
   "transformers_version": "4.35.2"

 {
+  "bos_token_id": 1,
+  "decoder_start_token_id": 1,
+  "early_stopping": true,
+  "eos_token_id": 2,
+  "max_new_tokens": 128,
   "num_beams": 5,
   "pad_token_id": 0,
   "transformers_version": "4.35.2"