devvanshhh
/

flan-xl-gen6

Text2Text Generation

Generated from Trainer

text-generation-inference

Inference Endpoints

8-bit precision

Model card Files Files and versions Metrics Training metrics Community

flan-xl-gen6 / README.md

devvanshhh's picture

Model save

66a495d over 1 year ago

|

2.32 kB

	---
	base_model: ybelkada/flan-t5-xl-sharded-bf16
	tags:
	- generated_from_trainer
	metrics:
	- rouge
	model-index:
	- name: flan-xl-gen6
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# flan-xl-gen6

	This model is a fine-tuned version of [ybelkada/flan-t5-xl-sharded-bf16](https://huggingface.co./ybelkada/flan-t5-xl-sharded-bf16) on the None dataset.
	It achieves the following results on the evaluation set:
	- Loss: 0.4978
	- Rouge1: 29.5362
	- Rouge2: 20.6621
	- Rougel: 25.7689
	- Rougelsum: 26.2351
	- Gen Len: 12.7388

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0005
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 800
	- num_epochs: 8

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rouge1 \| Rouge2 \| Rougel \| Rougelsum \| Gen Len \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:-------:\|:-------:\|:-------:\|:---------:\|:-------:\|
	\| No log \| 1.0 \| 328 \| 0.6921 \| 34.9112 \| 26.7503 \| 31.4124 \| 31.7295 \| 10.0172 \|
	\| 6.8746 \| 2.0 \| 656 \| 0.6025 \| 33.9134 \| 25.3236 \| 30.1968 \| 30.472 \| 10.8454 \|
	\| 6.8746 \| 3.0 \| 984 \| 0.5687 \| 31.6178 \| 22.9463 \| 27.8758 \| 28.3572 \| 11.8729 \|
	\| 0.6462 \| 4.0 \| 1312 \| 0.5355 \| 30.8157 \| 22.1783 \| 27.1641 \| 27.569 \| 12.1306 \|
	\| 0.5618 \| 5.0 \| 1640 \| 0.5160 \| 29.9183 \| 21.0842 \| 26.1671 \| 26.5965 \| 12.5017 \|
	\| 0.5618 \| 6.0 \| 1968 \| 0.5025 \| 29.7823 \| 21.1443 \| 26.0286 \| 26.5215 \| 12.5086 \|
	\| 0.498 \| 7.0 \| 2296 \| 0.4978 \| 29.1043 \| 20.2391 \| 25.3347 \| 25.804 \| 12.8969 \|
	\| 0.4551 \| 8.0 \| 2624 \| 0.4978 \| 29.5362 \| 20.6621 \| 25.7689 \| 26.2351 \| 12.7388 \|


	### Framework versions

	- Transformers 4.35.2
	- Pytorch 2.1.0+cu118
	- Datasets 2.15.0
	- Tokenizers 0.15.0