|
--- |
|
library_name: transformers |
|
license: mit |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# Model Card for TokenSwift-DeepSeek-R1-Distill-Qwen-32B |
|
|
|
This model implements TokenSwift, a framework that accelerates text generation for long sequences (up to 100K tokens), as described in [From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens](https://arxiv.org/abs/2502.18890). |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
This model is a finetuned version of Qwen2.5 32B, adapted for efficient long sequence text generation using the TokenSwift framework. TokenSwift achieves lossless acceleration by using a tree-based attention mechanism to construct candidate tokens, then verifying these candidates against the full model with a KV cache. This approach reduces computation time significantly while maintaining output quality. |
|
|
|
- **Developed by:** [BigAI NLCO](https://www.bigai.ai/) |
|
- **License:** MIT |
|
- **Finetuned from model:** Qwen2.5 32B |
|
|
|
### Model Sources |
|
|
|
- **Repository:** https://huggingface.co./TokenSwift/TokenSwift-DeepSeek-R1-Distill-Qwen-32B |
|
- **Paper:** https://arxiv.org/abs/2502.18890 |
|
- **Code:** https://github.com/bigai-nlco/TokenSwift |
|
- **Demo:** https://github.com/user-attachments/assets/5094fca7-0b12-470c-a7b6-456d254855d1 |
|
|
|
## Uses |
|
|
|
### Direct Use |
|
|
|
This model can be used directly for generating long sequences of text. See the code example below for how to get started. |
|
|
|
### Downstream Use |
|
|
|
This model can be further fine-tuned for specific downstream tasks requiring long sequence generation. |
|
|
|
### Out-of-Scope Use |
|
|
|
This model is not intended for tasks that require short text generation or other NLP tasks like classification or translation. It is also not suitable for generating malicious or harmful content. |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
As a large language model, this model may exhibit biases present in the training data. It is important to be aware of these potential biases and to use the model responsibly. Additionally, the model's performance may degrade on inputs significantly different from the training data. |
|
|
|
## How to Get Started with the Model |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("TokenSwift/TokenSwift-DeepSeek-R1-Distill-Qwen-32B", trust_remote_code=True) |
|
model = AutoModelForCausalLM.from_pretrained("TokenSwift/TokenSwift-DeepSeek-R1-Distill-Qwen-32B", device_map="auto", trust_remote_code=True) |
|
|
|
# Example usage |
|
prompt = "Generate a long story about a futuristic city." |
|
inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
|
generated_text = model.generate(**inputs, max_length=10000) |
|
print(tokenizer.decode(generated_text[0])) |
|
``` |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
The model was trained on a filtered subset of the [PG-19](https://huggingface.co./datasets/deepmind/pg19) dataset, with sequences longer than 8K tokens removed. Processed training data can be found at [qwen2.5-pg19](https://huggingface.co./datasets/TokenSwift/qwen2.5_pg19_train_data). |
|
|
|
### Training Procedure |
|
|
|
Details about the training procedure can be found in the associated paper and the Github repository. |
|
|
|
## Citation |
|
|
|
```bibtex |
|
@misc{wu2025hoursminuteslosslessacceleration, |
|
title={From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens}, |
|
author={Tong Wu and Junzhe Shen and Zixia Jia and Yuxuan Wang and Zilong Zheng}, |
|
year={2025}, |
|
eprint={2502.18890}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL}, |
|
url={https://arxiv.org/abs/2502.18890}, |
|
} |
|
``` |