Add model card

397d6cf verified 1 day ago

3.58 kB

	---
	library_name: transformers
	license: mit
	pipeline_tag: text-generation
	---

	# Model Card for TokenSwift-DeepSeek-R1-Distill-Qwen-32B

	This model implements TokenSwift, a framework that accelerates text generation for long sequences (up to 100K tokens), as described in [From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens](https://arxiv.org/abs/2502.18890).

	## Model Details

	### Model Description

	This model is a finetuned version of Qwen2.5 32B, adapted for efficient long sequence text generation using the TokenSwift framework. TokenSwift achieves lossless acceleration by using a tree-based attention mechanism to construct candidate tokens, then verifying these candidates against the full model with a KV cache. This approach reduces computation time significantly while maintaining output quality.

	- Developed by: [BigAI NLCO](https://www.bigai.ai/)
	- License: MIT
	- Finetuned from model: Qwen2.5 32B

	### Model Sources

	- Repository: https://huggingface.co./TokenSwift/TokenSwift-DeepSeek-R1-Distill-Qwen-32B
	- Paper: https://arxiv.org/abs/2502.18890
	- Code: https://github.com/bigai-nlco/TokenSwift
	- Demo: https://github.com/user-attachments/assets/5094fca7-0b12-470c-a7b6-456d254855d1

	## Uses

	### Direct Use

	This model can be used directly for generating long sequences of text. See the code example below for how to get started.

	### Downstream Use

	This model can be further fine-tuned for specific downstream tasks requiring long sequence generation.

	### Out-of-Scope Use

	This model is not intended for tasks that require short text generation or other NLP tasks like classification or translation. It is also not suitable for generating malicious or harmful content.

	## Bias, Risks, and Limitations

	As a large language model, this model may exhibit biases present in the training data. It is important to be aware of these potential biases and to use the model responsibly. Additionally, the model's performance may degrade on inputs significantly different from the training data.

	## How to Get Started with the Model

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	tokenizer = AutoTokenizer.from_pretrained("TokenSwift/TokenSwift-DeepSeek-R1-Distill-Qwen-32B", trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained("TokenSwift/TokenSwift-DeepSeek-R1-Distill-Qwen-32B", device_map="auto", trust_remote_code=True)

	# Example usage
	prompt = "Generate a long story about a futuristic city."
	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
	generated_text = model.generate(**inputs, max_length=10000)
	print(tokenizer.decode(generated_text[0]))
	```

	## Training Details

	### Training Data

	The model was trained on a filtered subset of the [PG-19](https://huggingface.co./datasets/deepmind/pg19) dataset, with sequences longer than 8K tokens removed. Processed training data can be found at [qwen2.5-pg19](https://huggingface.co./datasets/TokenSwift/qwen2.5_pg19_train_data).

	### Training Procedure

	Details about the training procedure can be found in the associated paper and the Github repository.

	## Citation

	```bibtex
	@misc{wu2025hoursminuteslosslessacceleration,
	title={From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens},
	author={Tong Wu and Junzhe Shen and Zixia Jia and Yuxuan Wang and Zilong Zheng},
	year={2025},
	eprint={2502.18890},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2502.18890},
	}
	```