bart_samsum_v2 / README.md

End of training

cf4eb16 verified 9 months ago

5.16 kB

	---
	license: mit
	base_model: facebook/bart-large-cnn
	tags:
	- generated_from_trainer
	model-index:
	- name: bart_samsum_v2
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# bart_samsum_v2

	This model is a fine-tuned version of [facebook/bart-large-cnn](https://huggingface.co./facebook/bart-large-cnn) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.0236

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 4
	- eval_batch_size: 4
	- seed: 42
	- gradient_accumulation_steps: 16
	- total_train_batch_size: 64
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 8
	- num_epochs: 15

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 9.4233 \| 0.17 \| 1 \| 9.1990 \|
	\| 9.5213 \| 0.34 \| 2 \| 8.5394 \|
	\| 8.7467 \| 0.52 \| 3 \| 8.1115 \|
	\| 8.4697 \| 0.69 \| 4 \| 7.5747 \|
	\| 7.752 \| 0.86 \| 5 \| 6.8712 \|
	\| 7.0515 \| 1.03 \| 6 \| 5.8670 \|
	\| 6.0874 \| 1.2 \| 7 \| 4.6814 \|
	\| 5.0408 \| 1.38 \| 8 \| 3.8055 \|
	\| 4.14 \| 1.55 \| 9 \| 2.6678 \|
	\| 2.9893 \| 1.72 \| 10 \| 1.9701 \|
	\| 2.4337 \| 1.89 \| 11 \| 1.5191 \|
	\| 1.9451 \| 2.06 \| 12 \| 1.2105 \|
	\| 1.53 \| 2.24 \| 13 \| 0.9714 \|
	\| 1.2369 \| 2.41 \| 14 \| 0.7905 \|
	\| 1.0014 \| 2.58 \| 15 \| 0.6478 \|
	\| 0.8419 \| 2.75 \| 16 \| 0.5493 \|
	\| 0.7338 \| 2.92 \| 17 \| 0.4770 \|
	\| 0.6393 \| 3.1 \| 18 \| 0.4151 \|
	\| 0.5747 \| 3.27 \| 19 \| 0.3691 \|
	\| 0.4962 \| 3.44 \| 20 \| 0.3293 \|
	\| 0.4516 \| 3.61 \| 21 \| 0.2935 \|
	\| 0.3995 \| 3.78 \| 22 \| 0.2614 \|
	\| 0.3618 \| 3.96 \| 23 \| 0.2346 \|
	\| 0.3246 \| 4.13 \| 24 \| 0.2129 \|
	\| 0.2929 \| 4.3 \| 25 \| 0.1938 \|
	\| 0.278 \| 4.47 \| 26 \| 0.1770 \|
	\| 0.2493 \| 4.65 \| 27 \| 0.1627 \|
	\| 0.2273 \| 4.82 \| 28 \| 0.1500 \|
	\| 0.2067 \| 4.99 \| 29 \| 0.1381 \|
	\| 0.1917 \| 5.16 \| 30 \| 0.1274 \|
	\| 0.1805 \| 5.33 \| 31 \| 0.1174 \|
	\| 0.1557 \| 5.51 \| 32 \| 0.1081 \|
	\| 0.1495 \| 5.68 \| 33 \| 0.1002 \|
	\| 0.1394 \| 5.85 \| 34 \| 0.0933 \|
	\| 0.1261 \| 6.02 \| 35 \| 0.0868 \|
	\| 0.1155 \| 6.19 \| 36 \| 0.0809 \|
	\| 0.1114 \| 6.37 \| 37 \| 0.0755 \|
	\| 0.1041 \| 6.54 \| 38 \| 0.0705 \|
	\| 0.0952 \| 6.71 \| 39 \| 0.0657 \|
	\| 0.0881 \| 6.88 \| 40 \| 0.0615 \|
	\| 0.0823 \| 7.05 \| 41 \| 0.0577 \|
	\| 0.0778 \| 7.23 \| 42 \| 0.0545 \|
	\| 0.071 \| 7.4 \| 43 \| 0.0515 \|
	\| 0.07 \| 7.57 \| 44 \| 0.0487 \|
	\| 0.0625 \| 7.74 \| 45 \| 0.0463 \|
	\| 0.0589 \| 7.91 \| 46 \| 0.0440 \|
	\| 0.0567 \| 8.09 \| 47 \| 0.0422 \|
	\| 0.0537 \| 8.26 \| 48 \| 0.0411 \|
	\| 0.05 \| 8.43 \| 49 \| 0.0398 \|
	\| 0.0472 \| 8.6 \| 50 \| 0.0384 \|
	\| 0.0458 \| 8.77 \| 51 \| 0.0363 \|
	\| 0.0455 \| 8.95 \| 52 \| 0.0347 \|
	\| 0.0412 \| 9.12 \| 53 \| 0.0340 \|
	\| 0.0414 \| 9.29 \| 54 \| 0.0326 \|
	\| 0.0403 \| 9.46 \| 55 \| 0.0333 \|
	\| 0.0384 \| 9.63 \| 56 \| 0.0303 \|
	\| 0.0353 \| 9.81 \| 57 \| 0.0298 \|
	\| 0.0348 \| 9.98 \| 58 \| 0.0293 \|
	\| 0.0342 \| 10.15 \| 59 \| 0.0275 \|
	\| 0.0311 \| 10.32 \| 60 \| 0.0272 \|
	\| 0.0317 \| 10.49 \| 61 \| 0.0270 \|
	\| 0.0315 \| 10.67 \| 62 \| 0.0261 \|
	\| 0.0289 \| 10.84 \| 63 \| 0.0253 \|
	\| 0.0285 \| 11.01 \| 64 \| 0.0247 \|
	\| 0.0273 \| 11.18 \| 65 \| 0.0244 \|
	\| 0.0277 \| 11.35 \| 66 \| 0.0240 \|
	\| 0.0267 \| 11.53 \| 67 \| 0.0237 \|
	\| 0.0263 \| 11.7 \| 68 \| 0.0237 \|
	\| 0.0258 \| 11.87 \| 69 \| 0.0237 \|
	\| 0.0254 \| 12.04 \| 70 \| 0.0238 \|
	\| 0.0248 \| 12.22 \| 71 \| 0.0239 \|
	\| 0.0246 \| 12.39 \| 72 \| 0.0239 \|
	\| 0.0249 \| 12.56 \| 73 \| 0.0237 \|
	\| 0.0239 \| 12.73 \| 74 \| 0.0236 \|
	\| 0.0247 \| 12.9 \| 75 \| 0.0236 \|


	### Framework versions

	- Transformers 4.38.1
	- Pytorch 2.1.0+cu121
	- Datasets 2.17.1
	- Tokenizers 0.15.2

	---
	license: mit
	base_model: facebook/bart-large-cnn
	tags:
	- generated_from_trainer
	model-index:
	- name: bart_samsum_v2
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# bart_samsum_v2

	This model is a fine-tuned version of [facebook/bart-large-cnn](https://huggingface.co./facebook/bart-large-cnn) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.0236

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 4
	- eval_batch_size: 4
	- seed: 42
	- gradient_accumulation_steps: 16
	- total_train_batch_size: 64
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 8
	- num_epochs: 15

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 9.4233 \| 0.17 \| 1 \| 9.1990 \|
	\| 9.5213 \| 0.34 \| 2 \| 8.5394 \|
	\| 8.7467 \| 0.52 \| 3 \| 8.1115 \|
	\| 8.4697 \| 0.69 \| 4 \| 7.5747 \|
	\| 7.752 \| 0.86 \| 5 \| 6.8712 \|
	\| 7.0515 \| 1.03 \| 6 \| 5.8670 \|
	\| 6.0874 \| 1.2 \| 7 \| 4.6814 \|
	\| 5.0408 \| 1.38 \| 8 \| 3.8055 \|
	\| 4.14 \| 1.55 \| 9 \| 2.6678 \|
	\| 2.9893 \| 1.72 \| 10 \| 1.9701 \|
	\| 2.4337 \| 1.89 \| 11 \| 1.5191 \|
	\| 1.9451 \| 2.06 \| 12 \| 1.2105 \|
	\| 1.53 \| 2.24 \| 13 \| 0.9714 \|
	\| 1.2369 \| 2.41 \| 14 \| 0.7905 \|
	\| 1.0014 \| 2.58 \| 15 \| 0.6478 \|
	\| 0.8419 \| 2.75 \| 16 \| 0.5493 \|
	\| 0.7338 \| 2.92 \| 17 \| 0.4770 \|
	\| 0.6393 \| 3.1 \| 18 \| 0.4151 \|
	\| 0.5747 \| 3.27 \| 19 \| 0.3691 \|
	\| 0.4962 \| 3.44 \| 20 \| 0.3293 \|
	\| 0.4516 \| 3.61 \| 21 \| 0.2935 \|
	\| 0.3995 \| 3.78 \| 22 \| 0.2614 \|
	\| 0.3618 \| 3.96 \| 23 \| 0.2346 \|
	\| 0.3246 \| 4.13 \| 24 \| 0.2129 \|
	\| 0.2929 \| 4.3 \| 25 \| 0.1938 \|
	\| 0.278 \| 4.47 \| 26 \| 0.1770 \|
	\| 0.2493 \| 4.65 \| 27 \| 0.1627 \|
	\| 0.2273 \| 4.82 \| 28 \| 0.1500 \|
	\| 0.2067 \| 4.99 \| 29 \| 0.1381 \|
	\| 0.1917 \| 5.16 \| 30 \| 0.1274 \|
	\| 0.1805 \| 5.33 \| 31 \| 0.1174 \|
	\| 0.1557 \| 5.51 \| 32 \| 0.1081 \|
	\| 0.1495 \| 5.68 \| 33 \| 0.1002 \|
	\| 0.1394 \| 5.85 \| 34 \| 0.0933 \|
	\| 0.1261 \| 6.02 \| 35 \| 0.0868 \|
	\| 0.1155 \| 6.19 \| 36 \| 0.0809 \|
	\| 0.1114 \| 6.37 \| 37 \| 0.0755 \|
	\| 0.1041 \| 6.54 \| 38 \| 0.0705 \|
	\| 0.0952 \| 6.71 \| 39 \| 0.0657 \|
	\| 0.0881 \| 6.88 \| 40 \| 0.0615 \|
	\| 0.0823 \| 7.05 \| 41 \| 0.0577 \|
	\| 0.0778 \| 7.23 \| 42 \| 0.0545 \|
	\| 0.071 \| 7.4 \| 43 \| 0.0515 \|
	\| 0.07 \| 7.57 \| 44 \| 0.0487 \|
	\| 0.0625 \| 7.74 \| 45 \| 0.0463 \|
	\| 0.0589 \| 7.91 \| 46 \| 0.0440 \|
	\| 0.0567 \| 8.09 \| 47 \| 0.0422 \|
	\| 0.0537 \| 8.26 \| 48 \| 0.0411 \|
	\| 0.05 \| 8.43 \| 49 \| 0.0398 \|
	\| 0.0472 \| 8.6 \| 50 \| 0.0384 \|
	\| 0.0458 \| 8.77 \| 51 \| 0.0363 \|
	\| 0.0455 \| 8.95 \| 52 \| 0.0347 \|
	\| 0.0412 \| 9.12 \| 53 \| 0.0340 \|
	\| 0.0414 \| 9.29 \| 54 \| 0.0326 \|
	\| 0.0403 \| 9.46 \| 55 \| 0.0333 \|
	\| 0.0384 \| 9.63 \| 56 \| 0.0303 \|
	\| 0.0353 \| 9.81 \| 57 \| 0.0298 \|
	\| 0.0348 \| 9.98 \| 58 \| 0.0293 \|
	\| 0.0342 \| 10.15 \| 59 \| 0.0275 \|
	\| 0.0311 \| 10.32 \| 60 \| 0.0272 \|
	\| 0.0317 \| 10.49 \| 61 \| 0.0270 \|
	\| 0.0315 \| 10.67 \| 62 \| 0.0261 \|
	\| 0.0289 \| 10.84 \| 63 \| 0.0253 \|
	\| 0.0285 \| 11.01 \| 64 \| 0.0247 \|
	\| 0.0273 \| 11.18 \| 65 \| 0.0244 \|
	\| 0.0277 \| 11.35 \| 66 \| 0.0240 \|
	\| 0.0267 \| 11.53 \| 67 \| 0.0237 \|
	\| 0.0263 \| 11.7 \| 68 \| 0.0237 \|
	\| 0.0258 \| 11.87 \| 69 \| 0.0237 \|
	\| 0.0254 \| 12.04 \| 70 \| 0.0238 \|
	\| 0.0248 \| 12.22 \| 71 \| 0.0239 \|
	\| 0.0246 \| 12.39 \| 72 \| 0.0239 \|
	\| 0.0249 \| 12.56 \| 73 \| 0.0237 \|
	\| 0.0239 \| 12.73 \| 74 \| 0.0236 \|
	\| 0.0247 \| 12.9 \| 75 \| 0.0236 \|


	### Framework versions

	- Transformers 4.38.1
	- Pytorch 2.1.0+cu121
	- Datasets 2.17.1
	- Tokenizers 0.15.2