fblgit
/

cybertron-v4-qw7B-MGS

Text Generation

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

cybertron-v4-qw7B-MGS / README.md

fblgit's picture

Update README.md

ea2aaf4 verified 10 days ago

|

2.23 kB

	---
	license: other
	license_name: qwen
	license_link: https://huggingface.co./Qwen/Qwen2.5-72B-Instruct/blob/main/LICENSE
	datasets:
	- Magpie-Align/Magpie-Qwen2.5-Pro-300K-Filtered
	base_model:
	- Qwen/Qwen2.5-7B-Instruct
	library_name: transformers
	tags:
	- generated_from_trainer
	language:
	- en
	---

	# cybertron-v4-qw7B-MGS

	Introducing: cybertron-v4 based on Qwen2.5 7B
	SFT over Magpie-Align/Magpie-Qwen2.5-Pro-1M-v0.1

	## Training procedure
	1 Epoch as usual.
	[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)

	### Training hyperparameters

	The following hyperparameters were used during training:
	- seed: 42
	- distributed_type: multi-GPU
	- num_devices: 8
	- total_train_batch_size: 128
	- total_eval_batch_size: 16
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- num_epochs: 1

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 0.7405 \| 0.0007 \| 1 \| 0.5760 \|
	\| 0.6146 \| 0.0502 \| 71 \| 0.5045 \|
	\| 0.5908 \| 0.1003 \| 142 \| 0.4930 \|
	\| 0.5669 \| 0.1505 \| 213 \| 0.4854 \|
	\| 0.5575 \| 0.2007 \| 284 \| 0.4811 \|
	\| 0.535 \| 0.2508 \| 355 \| 0.4765 \|
	\| 0.5161 \| 0.3010 \| 426 \| 0.4736 \|
	\| 0.5268 \| 0.3511 \| 497 \| 0.4726 \|
	\| 0.5119 \| 0.4013 \| 568 \| 0.4701 \|
	\| 0.5329 \| 0.4515 \| 639 \| 0.4687 \|
	\| 0.5167 \| 0.5016 \| 710 \| 0.4673 \|
	\| 0.5105 \| 0.5518 \| 781 \| 0.4660 \|
	\| 0.5203 \| 0.6020 \| 852 \| 0.4653 \|
	\| 0.5035 \| 0.6521 \| 923 \| 0.4646 \|
	\| 0.4903 \| 0.7023 \| 994 \| 0.4641 \|
	\| 0.5031 \| 0.7525 \| 1065 \| 0.4628 \|
	\| 0.5147 \| 0.8026 \| 1136 \| 0.4629 \|
	\| 0.5037 \| 0.8528 \| 1207 \| 0.4620 \|
	\| 0.5029 \| 0.9029 \| 1278 \| 0.4620 \|
	\| 0.492 \| 0.9531 \| 1349 \| 0.4621 \|


	### Framework versions

	- PEFT 0.13.2
	- Transformers 4.45.2
	- Pytorch 2.3.0+cu121
	- Datasets 3.0.1
	- Tokenizers 0.20.1