sonthenguyen
/

zephyr-sft-bnb-4bit-DPO-dismissive_mtbc-252steps

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

zephyr-sft-bnb-4bit-DPO-dismissive_mtbc-252steps / README.md

sonthenguyen's picture

Update README.md

7985f9f verified 4 days ago

|

history blame contribute delete

1.14 kB

	---
	base_model: unsloth/zephyr-sft-bnb-4bit
	language:
	- en
	license: apache-2.0
	tags:
	- text-generation-inference
	- transformers
	- unsloth
	- mistral
	- trl
	- dpo
	---
	TrainOutput(global_step=252, training_loss=0.0682461416144461, metrics={'train_runtime': 2060.2317, 'train_samples_per_second': 1.953, 'train_steps_per_second': 0.122, 'total_flos': 0.0, 'train_loss': 0.0682461416144461, 'epoch': 0.5009940357852882})
	==((====))== Unsloth - 2x faster free finetuning \| Num GPUs = 1
	\\ /\| Num examples = 8,046 \| Num Epochs = 1
	O^O/ \_/ \ Batch size per device = 4 \| Gradient Accumulation steps = 4
	\ / Total batch size = 16 \| Total steps = 252
	"-____-" Number of trainable parameters = 41,943,040
	# Uploaded model

	- Developed by: sonthenguyen
	- License: apache-2.0
	- Finetuned from model : unsloth/zephyr-sft-bnb-4bit

	This mistral model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.

	[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)