--- license: apache-2.0 language: - ar - en base_model: - ALLaM-AI/ALLaM-7B-Instruct-preview --- # ๐Ÿฆ™ ALLaM-7B-Instruct-GGUF This repository provides **quantized GGUF versions** of **ALLaM-7B-Instruct**, optimized for efficient inference using `llama.cpp`. ## โš ๏ธ **Acknowledgment** The **original model** was developed by **ALLaM-AI** and is available here: ๐Ÿ”— [ALLaM-7B-Instruct-Preview](https://huggingface.co./ALLaM-AI/ALLaM-7B-Instruct-preview) This repository **only provides quantized versions** for improved performance on different hardware. --- ## โœจ **Overview** **ALLaM-7B-Instruct** is an Arabic-centric **instruction-tuned** model based on **Metaโ€™s LLaMA architecture**, designed for natural language understanding and generation in Arabic. ### **๐Ÿš€ Whatโ€™s New?** โœ… **GGUF Format** โ€“ Optimized for `llama.cpp` โœ… **Multiple Quantization Levels** โ€“ Balance between precision and efficiency โœ… **Run on CPUs & Low-Resource Devices** โ€“ No need for high-end GPUs! --- ## ๐Ÿ“‚ **Available Model Quantizations** | Model Variant | Precision | Size | Best For | |--------------|------------|-------|------------| | `ALLaM-7B-Instruct-f16.gguf` | FP16 | Large | High-precision tasks | | `ALLaM-7B-Instruct-Q8_0.gguf` | 8-bit | Medium | Balanced quality & speed | | `ALLaM-7B-Instruct-Q6_K.gguf` | 6-bit | Small | Good trade-off | | `ALLaM-7B-Instruct-Q5_0.gguf` | 5-bit | Small | Alternative quantization | | `ALLaM-7B-Instruct-Q5_K_M.gguf` | 5-bit | Smaller | Fast inference | | `ALLaM-7B-Instruct-Q4_0.gguf` | 4-bit | Very Small | Legacy format | | `ALLaM-7B-Instruct-Q4_K_M.gguf` | 4-bit | Very Small | Low-memory devices | | `ALLaM-7B-Instruct-Q2_K.gguf` | 2-bit | Smallest | Extreme efficiency | --- ## ๐Ÿ“– **Installation & Setup** ### **1๏ธโƒฃ Install `llama.cpp`** Clone and build `llama.cpp`: ```bash git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp make ``` ### **2๏ธโƒฃ Download the Model** Choose and download a `.gguf` file from this repository. ### **3๏ธโƒฃ Run Inference** Use `llama.cpp` to generate responses: ```bash ./main -m ALLaM-7B-Instruct-Q4_0.gguf -p "ูƒูŠู ุฃุฌู‡ุฒ ูƒูˆุจ ุดุงู‡ูŠุŸ" ``` Expected Output: ``` ู„ุชุญุถูŠุฑ ูƒูˆุจ ุดุงูŠุŒ ุงุบู„ูŠ ุงู„ู…ุงุกุŒ ุถุน ุงู„ุดุงูŠ ููŠ ุงู„ูƒูˆุจุŒ ูˆุงุณูƒุจ ุงู„ู…ุงุก ุงู„ุณุงุฎู† ููˆู‚ู‡. ุงุชุฑูƒู‡ ู„ุฏู‚ุงุฆู‚ ุซู… ุงุณุชู…ุชุน ุจู…ุฐุงู‚ู‡! ``` --- ## ๐Ÿ“Š **Benchmarks & Performance** | Quantization Format | Model Size | CPU (Tokens/sec) | GPU (Tokens/sec) | |---------------------|------------|------------------|------------------| | FP16 | Large | ~2 | ~15 | | Q8_0 | Medium | ~4 | ~30 | | Q6_K | Smaller | ~6 | ~40 | | Q5_0 | Small | ~7 | ~42 | | Q5_K_M | Smaller | ~8 | ~45 | | Q4_0 | Very Small | ~9 | ~48 | | Q4_K_M | Very Small | ~10 | ~50 | | Q2_K | Smallest | ~12 | ~55 | *Performance may vary based on hardware and configuration.* --- ## ๐Ÿ“œ **License** This model follows the **ALLaM-AI** license. Refer to their [Hugging Face repository](https://huggingface.co./ALLaM-AI/ALLaM-7B-Instruct-preview) for details. ## โค๏ธ **Acknowledgments** - **ALLaM-AI** for developing the original **ALLaM-7B-Instruct** model. - **llama.cpp** by [ggerganov](https://github.com/ggerganov) for optimized inference. ## โญ **Contributions & Feedback** If you find this quantized model useful, feel free to contribute, provide feedback, or share your results! ---