File size: 6,718 Bytes
15ba906 2808d18 15ba906 2808d18 15ba906 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 |
---
license: apache-2.0
language:
- ar
- bn
- cs
- de
- en
- es
- fa
- fr
- he
- hi
- id
- it
- ja
- km
- ko
- lo
- ms
- my
- nl
- pl
- pt
- ru
- th
- tl
- tk
- ur
- vi
- zh
base_model:
- ModelSpace/GemmaX2-28-2B-v0.1
pipeline_tag: translation
library_name: transformers
tags:
- gemma
- translation
- multilingual
- quantized
---
# Model Card for GemmaX2-28-2B GGUF Quantizations
## Model Overview
**GemmaX2-28-2B GGUF Quantizations** are a set of quantized variants of `GemmaX2-28-2B-v0.1`, an LLM-based translation model developed by Xiaomi. The original model was finetuned from `GemmaX2-28-2B-Pretrain`, which itself is a continually pretrained version of `Gemma2-2B` using a diverse dataset of 56 billion tokens across 28 languages. These GGUF versions (`f16`, `bf16`, `q8_0`, `tq1_0`, `tq2_0`) were created to optimize the model for efficient inference on resource-constrained environments while preserving translation capabilities.
- **Developed by**: Xiaomi (original model); quantized by Tonic
- **Model Type**: Transformer-based language model, finetuned for translation, quantized to GGUF format
- **Quantization Formats**: `f16` (16-bit float), `bf16` (bfloat16), `q8_0` (8-bit quantization), `tq1_0` (ternary quantization 1), `tq2_0` (ternary quantization 2)
- **Languages**: Arabic, Bengali, Czech, German, English, Spanish, Persian, French, Hebrew, Hindi, Indonesian, Italian, Japanese, Khmer, Korean, Lao, Malay, Burmese, Dutch, Polish, Portuguese, Russian, Thai, Tagalog, Turkish, Urdu, Vietnamese, Chinese
- **License**: [Apache 2.0]
- **Repository**: [Tonic/GemmaX2-28-2B-gguf](https://huggingface.co./Tonic/GemmaX2-28-2B-gguf)
## Model Description
`GemmaX2-28-2B-v0.1` is designed for multilingual machine translation, built on `GemmaX2-28-2B-Pretrain`, which was pretrained on a mix of monolingual and parallel data (56 billion tokens) across 28 languages. The finetuning process used a small, high-quality set of translation instruction data to enhance its performance. These GGUF quantizations were generated using `convert_hf_to_gguf.py`, converting the original Hugging Face model into formats compatible with tools like `llama.cpp` for efficient deployment.
### Quantization Details
- **Source Model**: `ModelSpace/GemmaX2-28-2B-v0.1`
- **Conversion Tool**: `convert_hf_to_gguf.py`
- **Quantization Types**:
- `f16`: 16-bit floating-point, minimal precision loss, larger file size (~5-7GB).
- `bf16`: Brain floating-point 16-bit, optimized for certain hardware (e.g., TPUs), similar size to `f16`.
- `q8_0`: 8-bit quantization, reduced size (~3-4GB), slight precision trade-off.
- `tq1_0`: Ternary quantization (1-bit), smallest size (~1-2GB), higher precision loss.
- `tq2_0`: Ternary quantization (2-bit variant), slightly larger than `tq1_0`, balanced size vs. quality.
## Intended Use
These quantized models are intended for:
- **Multilingual Translation**: Translating text across the 28 supported languages.
- **Efficient Inference**: Deployment on edge devices, low-memory systems, or environments with limited compute resources using GGUF-compatible frameworks (e.g., `llama.cpp`).
- **Research**: Studying the trade-offs between quantization levels and translation performance.
### Use Cases
- Real-time translation applications.
- Offline translation on mobile or embedded devices.
- Benchmarking quantized LLM performance in multilingual settings.
## Model Performance
The original `GemmaX2-28-2B-v0.1` model’s performance is detailed in the paper [Multilingual Machine Translation with Open Large Language Models at Practical Scale: An Empirical Study](https://arxiv.org/abs/2502.02481). Quantization introduces varying degrees of performance trade-offs:
- **`f16` and `bf16`**: Near-identical to the original model’s accuracy, with minimal degradation.
- **`q8_0`**: Slight reduction in translation quality, still suitable for most practical applications.
- **`tq1_0` and `tq2_0`**: Noticeable quality loss, best for scenarios prioritizing speed and size over precision.
Exact metrics depend on the downstream task and dataset; users are encouraged to evaluate performance for their specific use case.
## How to Use
### With Transformers (Original Model)
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "ModelSpace/GemmaX2-28-2B-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
text = "Translate this from Chinese to English:\nChinese: 我爱机器翻译\nEnglish:"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
### With GGUF (Quantized Models)
Download a GGUF file from `Tonic/GemmaX2-28-2B-gguf` and use it with a GGUF-compatible inference tool like `llama.cpp`:
```bash
# Example with llama.cpp
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
make
# Run inference with q8_0 model
./main -m gemmax2-28-2b-q8_0.gguf -p "Translate from Chinese to English: 我爱机器翻译\nEnglish:""
```
Available files:
- `gemmax2-28-2b-f16.gguf`
- `gemmax2-28-2b-bf16.gguf`
- `gemmax2-28-2b-q8_0.gguf`
- `gemmax2-28-2b-tq1_0.gguf`
- `gemmax2-28-2b-tq2_0.gguf`
## Limitations
- **Language Support**: Only supports the 28 languages listed above; performance on unsupported languages is not guaranteed.
- **Quantization Trade-offs**: Lower-bit quantizations (`tq1_0`, `tq2_0`) may degrade translation quality, especially for complex sentences or rare language pairs.
- **Hardware Compatibility**: `bf16` benefits from specific hardware support (e.g., NVIDIA Ampere GPUs, TPUs); performance may vary otherwise.
- **Future Improvements**: The original authors plan to enhance `GemmaX2-28-2B`’s translation capabilities, which may not be reflected in these quantized versions until updated.
## Citation
For the original model:
```bibtex
@misc{cui2025multilingualmachinetranslationopen,
title={Multilingual Machine Translation with Open Large Language Models at Practical Scale: An Empirical Study},
author={Menglong Cui and Pengzhi Gao and Wei Liu and Jian Luan and Bin Wang},
year={2025},
eprint={2502.02481},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.02481},
}
```
For these quantized versions, please also credit:
- **Quantization by**: [Tonic](https://huggingface.co./Tonic)
- **Repository**: [Tonic/GemmaX2-28-2B-gguf](https://huggingface.co./Tonic/GemmaX2-28-2B-gguf)
## Contact
For questions about the original model, refer to Xiaomi’s publication. For issues with the GGUF quantizations, contact Tonic via Hugging Face discussions at `Tonic/GemmaX2-28-2B-gguf`. |