---
license: apache-2.0
language:
- ar
- en
base_model:
- ALLaM-AI/ALLaM-7B-Instruct-preview
---

# 🦙 ALLaM-7B-Instruct-GGUF  

This repository provides **quantized GGUF versions** of **ALLaM-7B-Instruct**, optimized for efficient inference using `llama.cpp`.  

## ⚠️ **Acknowledgment**  
The **original model** was developed by **ALLaM-AI** and is available here:  
🔗 [ALLaM-7B-Instruct-Preview](https://huggingface.co./ALLaM-AI/ALLaM-7B-Instruct-preview)  

This repository **only provides quantized versions** for improved performance on different hardware.  

---

## ✨ **Overview**  
**ALLaM-7B-Instruct** is an Arabic-centric **instruction-tuned** model based on **Meta’s LLaMA architecture**, designed for natural language understanding and generation in Arabic.  

### **🚀 What’s New?**  
✅ **GGUF Format** – Optimized for `llama.cpp`  
✅ **Multiple Quantization Levels** – Balance between precision and efficiency  
✅ **Run on CPUs & Low-Resource Devices** – No need for high-end GPUs!  

---

## 📂 **Available Model Quantizations**  
| Model Variant | Precision | Size | Best For |
|--------------|------------|-------|------------|
| `ALLaM-7B-Instruct-f16.gguf` | FP16 | Large | High-precision tasks |
| `ALLaM-7B-Instruct-Q8_0.gguf` | 8-bit | Medium | Balanced quality & speed |
| `ALLaM-7B-Instruct-Q6_K.gguf` | 6-bit | Small | Good trade-off |
| `ALLaM-7B-Instruct-Q5_0.gguf` | 5-bit | Small | Alternative quantization |
| `ALLaM-7B-Instruct-Q5_K_M.gguf` | 5-bit | Smaller | Fast inference |
| `ALLaM-7B-Instruct-Q4_0.gguf` | 4-bit | Very Small | Legacy format |
| `ALLaM-7B-Instruct-Q4_K_M.gguf` | 4-bit | Very Small | Low-memory devices |
| `ALLaM-7B-Instruct-Q2_K.gguf` | 2-bit | Smallest | Extreme efficiency |

---

## 📖 **Installation & Setup**  

### **1️⃣ Install `llama.cpp`**  
Clone and build `llama.cpp`:  
```bash
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
make
```

### **2️⃣ Download the Model**  
Choose and download a `.gguf` file from this repository.  

### **3️⃣ Run Inference**  
Use `llama.cpp` to generate responses:  
```bash
./main -m ALLaM-7B-Instruct-Q4_0.gguf -p "كيف أجهز كوب شاهي؟"
```
Expected Output:  
```
لتحضير كوب شاي، اغلي الماء، ضع الشاي في الكوب، واسكب الماء الساخن فوقه. اتركه لدقائق ثم استمتع بمذاقه!
```

---

## 📊 **Benchmarks & Performance**  
| Quantization Format | Model Size | CPU (Tokens/sec) | GPU (Tokens/sec) |
|---------------------|------------|------------------|------------------|
| FP16                | Large      | ~2               | ~15              |
| Q8_0                | Medium     | ~4               | ~30              |
| Q6_K                | Smaller    | ~6               | ~40              |
| Q5_0                | Small      | ~7               | ~42              |
| Q5_K_M              | Smaller    | ~8               | ~45              |
| Q4_0                | Very Small | ~9               | ~48              |
| Q4_K_M              | Very Small | ~10              | ~50              |
| Q2_K                | Smallest   | ~12              | ~55              |

*Performance may vary based on hardware and configuration.*  

---

## 📜 **License**  
This model follows the **ALLaM-AI** license. Refer to their [Hugging Face repository](https://huggingface.co./ALLaM-AI/ALLaM-7B-Instruct-preview) for details.  

## ❤️ **Acknowledgments**  
- **ALLaM-AI** for developing the original **ALLaM-7B-Instruct** model.  
- **llama.cpp** by [ggerganov](https://github.com/ggerganov) for optimized inference.  

## ⭐ **Contributions & Feedback**  
If you find this quantized model useful, feel free to contribute, provide feedback, or share your results!  

---