kanishka
/

opt-babylm2-rewritten-clean-spacy-random_removal_numadj-earlystop-bpe_seed-42_1e-3

Text Generation

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

opt-babylm2-rewritten-clean-spacy-random_removal_numadj-earlystop-bpe_seed-42_1e-3 / README.md

kanishka's picture

End of training

343aa63 verified 2 days ago

|

history blame contribute delete

3.31 kB

	---
	library_name: transformers
	tags:
	- generated_from_trainer
	datasets:
	- kanishka/babylm2-rewritten-clean-spacy-random_removal_numadj
	metrics:
	- accuracy
	model-index:
	- name: opt-babylm2-rewritten-clean-spacy-random_removal_numadj-earlystop-bpe_seed-42_1e-3
	results:
	- task:
	name: Causal Language Modeling
	type: text-generation
	dataset:
	name: kanishka/babylm2-rewritten-clean-spacy-random_removal_numadj
	type: kanishka/babylm2-rewritten-clean-spacy-random_removal_numadj
	metrics:
	- name: Accuracy
	type: accuracy
	value: 0.47811193958124093
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# opt-babylm2-rewritten-clean-spacy-random_removal_numadj-earlystop-bpe_seed-42_1e-3

	This model was trained from scratch on the kanishka/babylm2-rewritten-clean-spacy-random_removal_numadj dataset.
	It achieves the following results on the evaluation set:
	- Loss: 2.6927
	- Accuracy: 0.4781

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.001
	- train_batch_size: 32
	- eval_batch_size: 64
	- seed: 42
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 256
	- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_steps: 32000
	- num_epochs: 20.0
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \|
	\|:-------------:\|:-------:\|:-----:\|:---------------:\|:--------:\|
	\| 32.5094 \| 0.9997 \| 2243 \| 3.8102 \| 0.3615 \|
	\| 27.4934 \| 1.9997 \| 4486 \| 3.2932 \| 0.4103 \|
	\| 24.9364 \| 2.9997 \| 6729 \| 3.0832 \| 0.4316 \|
	\| 23.6407 \| 3.9997 \| 8972 \| 2.9806 \| 0.4416 \|
	\| 22.7053 \| 4.9997 \| 11215 \| 2.9227 \| 0.4477 \|
	\| 22.2601 \| 5.9997 \| 13458 \| 2.8859 \| 0.4513 \|
	\| 21.9123 \| 6.9997 \| 15701 \| 2.8600 \| 0.4546 \|
	\| 21.6403 \| 7.9997 \| 17944 \| 2.8425 \| 0.4570 \|
	\| 21.5087 \| 8.9997 \| 20187 \| 2.8276 \| 0.4585 \|
	\| 21.3483 \| 9.9997 \| 22430 \| 2.8189 \| 0.4596 \|
	\| 21.2068 \| 10.9997 \| 24673 \| 2.8091 \| 0.4604 \|
	\| 21.0757 \| 11.9997 \| 26916 \| 2.8028 \| 0.4610 \|
	\| 21.12 \| 12.9997 \| 29159 \| 2.7997 \| 0.4619 \|
	\| 21.0442 \| 13.9997 \| 31402 \| 2.7952 \| 0.4622 \|
	\| 20.9217 \| 14.9997 \| 33645 \| 2.7750 \| 0.4649 \|
	\| 20.5419 \| 15.9997 \| 35888 \| 2.7506 \| 0.4683 \|
	\| 20.1666 \| 16.9997 \| 38131 \| 2.7245 \| 0.4714 \|
	\| 19.7172 \| 17.9997 \| 40374 \| 2.7101 \| 0.4740 \|
	\| 19.1888 \| 18.9997 \| 42617 \| 2.6955 \| 0.4768 \|
	\| 18.63 \| 19.9997 \| 44860 \| 2.6927 \| 0.4781 \|


	### Framework versions

	- Transformers 4.47.1
	- Pytorch 2.5.1+cu124
	- Datasets 3.1.0
	- Tokenizers 0.21.0