--- license: apache-2.0 base_model: t5-large tags: - generated_from_trainer datasets: - arrow model-index: - name: RoBERTa_T5_dependent_V1 results: [] --- # RoBERTa_T5_dependent_V1 This model is a fine-tuned version of [t5-large](https://huggingface.co./t5-large) on the arrow dataset. It achieves the following results on the evaluation set: - Loss: 2.4943 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-05 - train_batch_size: 1 - eval_batch_size: 1 - seed: 42 - distributed_type: multi-GPU - num_devices: 4 - gradient_accumulation_steps: 4 - total_train_batch_size: 16 - total_eval_batch_size: 4 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 40 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:-------:|:----:|:---------------:| | 9.2486 | 0.9963 | 68 | 4.2390 | | 3.8105 | 1.9927 | 136 | 3.2877 | | 3.2106 | 2.9890 | 204 | 2.9341 | | 2.8659 | 4.0 | 273 | 2.7676 | | 2.7264 | 4.9963 | 341 | 2.6706 | | 2.5997 | 5.9927 | 409 | 2.6175 | | 2.5016 | 6.9890 | 477 | 2.5781 | | 2.3939 | 8.0 | 546 | 2.5520 | | 2.3659 | 8.9963 | 614 | 2.5352 | | 2.3132 | 9.9927 | 682 | 2.5209 | | 2.2625 | 10.9890 | 750 | 2.5084 | | 2.1871 | 12.0 | 819 | 2.5006 | | 2.1777 | 12.9963 | 887 | 2.4977 | | 2.1434 | 13.9927 | 955 | 2.4919 | | 2.1108 | 14.9890 | 1023 | 2.4913 | | 2.0496 | 16.0 | 1092 | 2.4878 | | 2.0483 | 16.9963 | 1160 | 2.4907 | | 2.0225 | 17.9927 | 1228 | 2.4924 | | 1.9965 | 18.9890 | 1296 | 2.4926 | | 1.9427 | 20.0 | 1365 | 2.4960 | | 1.9492 | 20.9963 | 1433 | 2.4943 | ### Framework versions - Transformers 4.40.1 - Pytorch 2.2.1+cu121 - Datasets 2.17.1 - Tokenizers 0.19.1