first training information

0d4b002 verified about 2 months ago

4.47 kB

	---
	base_model: google-bert/bert-base-uncased
	library_name: peft
	license: apache-2.0
	metrics:
	- accuracy
	tags:
	- trl
	- sft
	- generated_from_trainer
	model-index:
	- name: output
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# output

	This model is a fine-tuned version of [google-bert/bert-base-uncased](https://huggingface.co./google-bert/bert-base-uncased) on [SWAG](https://huggingface.co./datasets/allenai/swag) dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.6749
	- Accuracy: 0.7503

	## Model description

	More information needed

	## Intended uses & limitations

	This model should be used as an expert in the [Meteor-of-LoRA framework](https://github.com/ParagonLight/meteor-of-lora).

	## Training and evaluation data

	The data were splitted based on HuggingFace default dataset:

	```python3
	dataset = load_dataset("swag")
	```

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 5e-05
	- train_batch_size: 16
	- eval_batch_size: 16
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 5

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \|
	\|:-------------:\|:------:\|:-----:\|:---------------:\|:--------:\|
	\| 1.3807 \| 0.1088 \| 500 \| 1.2507 \| 0.6138 \|
	\| 1.1949 \| 0.2175 \| 1000 \| 1.0938 \| 0.5737 \|
	\| 1.132 \| 0.3263 \| 1500 \| 1.0330 \| 0.5657 \|
	\| 1.0348 \| 0.4351 \| 2000 \| 0.9162 \| 0.6440 \|
	\| 1.0008 \| 0.5438 \| 2500 \| 0.8464 \| 0.6801 \|
	\| 0.9609 \| 0.6526 \| 3000 \| 0.8267 \| 0.6859 \|
	\| 0.9454 \| 0.7614 \| 3500 \| 0.8116 \| 0.6943 \|
	\| 0.9512 \| 0.8701 \| 4000 \| 0.8125 \| 0.6955 \|
	\| 0.9367 \| 0.9789 \| 4500 \| 0.7838 \| 0.7032 \|
	\| 0.9205 \| 1.0877 \| 5000 \| 0.7861 \| 0.7044 \|
	\| 0.9189 \| 1.1964 \| 5500 \| 0.7713 \| 0.7088 \|
	\| 0.8975 \| 1.3052 \| 6000 \| 0.7538 \| 0.7173 \|
	\| 0.9065 \| 1.4140 \| 6500 \| 0.7520 \| 0.7175 \|
	\| 0.8957 \| 1.5227 \| 7000 \| 0.7513 \| 0.7200 \|
	\| 0.8768 \| 1.6315 \| 7500 \| 0.7411 \| 0.7195 \|
	\| 0.8858 \| 1.7403 \| 8000 \| 0.7306 \| 0.7262 \|
	\| 0.875 \| 1.8490 \| 8500 \| 0.7302 \| 0.7268 \|
	\| 0.8649 \| 1.9578 \| 9000 \| 0.7229 \| 0.7303 \|
	\| 0.8653 \| 2.0666 \| 9500 \| 0.7126 \| 0.7322 \|
	\| 0.867 \| 2.1753 \| 10000 \| 0.7198 \| 0.7293 \|
	\| 0.868 \| 2.2841 \| 10500 \| 0.7125 \| 0.7346 \|
	\| 0.855 \| 2.3929 \| 11000 \| 0.7051 \| 0.7350 \|
	\| 0.8557 \| 2.5016 \| 11500 \| 0.7008 \| 0.7384 \|
	\| 0.8622 \| 2.6104 \| 12000 \| 0.6979 \| 0.7389 \|
	\| 0.8506 \| 2.7192 \| 12500 \| 0.7068 \| 0.7378 \|
	\| 0.8558 \| 2.8279 \| 13000 \| 0.7082 \| 0.7337 \|
	\| 0.849 \| 2.9367 \| 13500 \| 0.6978 \| 0.7407 \|
	\| 0.8581 \| 3.0455 \| 14000 \| 0.6850 \| 0.7460 \|
	\| 0.8521 \| 3.1542 \| 14500 \| 0.6945 \| 0.7428 \|
	\| 0.8454 \| 3.2630 \| 15000 \| 0.6863 \| 0.7446 \|
	\| 0.8257 \| 3.3718 \| 15500 \| 0.6917 \| 0.7414 \|
	\| 0.8522 \| 3.4805 \| 16000 \| 0.6882 \| 0.7445 \|
	\| 0.8359 \| 3.5893 \| 16500 \| 0.6845 \| 0.7442 \|
	\| 0.8238 \| 3.6981 \| 17000 \| 0.6863 \| 0.7441 \|
	\| 0.8382 \| 3.8068 \| 17500 \| 0.6937 \| 0.7438 \|
	\| 0.8326 \| 3.9156 \| 18000 \| 0.6780 \| 0.7488 \|
	\| 0.8344 \| 4.0244 \| 18500 \| 0.6775 \| 0.7484 \|
	\| 0.8224 \| 4.1331 \| 19000 \| 0.6811 \| 0.7477 \|
	\| 0.8261 \| 4.2419 \| 19500 \| 0.6797 \| 0.7480 \|
	\| 0.8256 \| 4.3507 \| 20000 \| 0.6815 \| 0.7481 \|
	\| 0.8191 \| 4.4594 \| 20500 \| 0.6788 \| 0.7476 \|
	\| 0.838 \| 4.5682 \| 21000 \| 0.6802 \| 0.7490 \|
	\| 0.8383 \| 4.6770 \| 21500 \| 0.6753 \| 0.7498 \|
	\| 0.8343 \| 4.7857 \| 22000 \| 0.6762 \| 0.7498 \|
	\| 0.8381 \| 4.8945 \| 22500 \| 0.6749 \| 0.7503 \|


	### Framework versions

	- PEFT 0.12.1.dev0
	- Transformers 4.45.0.dev0
	- Pytorch 2.3.1+cu121
	- Datasets 2.21.0
	- Tokenizers 0.19.1