SandLogicTechnologies
/

Llama-3.1-Tulu-3-8B-GGUF

+---
+language:
+- en
+base_model:
+- allenai/Llama-3.1-Tulu-3-8B
+tags:
+- llama
+- math
+- conersational
+---
+# Quantized Llama-3.1-Tulu-3-8B Models
+This repository contains Q4_KM and Q5_KM quantized versions of the [allenai/Llama-3.1-Tulu-3-8B](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B) model. These quantized variants provide efficient alternatives while maintaining the core capabilities of Tülu3, a leading instruction-following model family.
+## Model Overview
+- **Original Model**: Llama-3.1-Tulu-3-8B
+- **Quantized Versions**:
+  - Q4_KM (4-bit quantization)
+  - Q5_KM (5-bit quantization)
+- **Base Architecture**: 8B parameter instruction-following model
+- **Developer**: Allen Institute for AI
+- **License**: Llama 3.1 Community License Agreement
+- **Language**: Primarily English
+- **Finetuned From**: allenai/Llama-3.1-Tulu-3-8B-DPO
+## Quantization Details
+### Q4_KM Version
+- Model size reduction: ~75% smaller than original
+- Memory footprint: 4.92 GB
+- Optimized for deployment in resource-constrained environments
+- Maintains core functionality with minimal performance impact
+### Q5_KM Version
+- Model size reduction: ~69% smaller than original
+- Memory footprint: 5.73 GB
+- Higher precision than Q4_KM
+- Better preservation of model quality
+## Key Features
+Both quantized versions maintain Tülu3's state-of-the-art performance on:
+- Instruction following tasks
+- Mathematical reasoning (MATH dataset)
+- Grade school math problems (GSM8K)
+- General instruction following (IFEval)
+- Chat-based interactions
+- Complex reasoning tasks
+## Usage
+```python
+from llama_cpp import Llama
+llm = Llama(
+    model_path="./models/7B/Llama-3.1-Tulu-3-8B.gguf",
+    verbose=False,
+    # n_gpu_layers=-1, # Uncomment to use GPU acceleration
+    # n_ctx=2048, # Uncomment to increase the context window
+)
+output = llm.create_chat_completion(
+    messages = [
+        {"role": "system", "content": "You're an AI assistant who help in answering user question"},
+        {
+            "role": "user",
+            "content": "Write an python code to find prime number"
+        }
+    ]
+)
+print(output["choices"][0]['message']['content'])
+```
+## Training Data
+The model was trained on a diverse mix of:
+- Publicly available datasets
+- Synthetic data
+- Human-created datasets
+## Bias, Risks, and Limitations
+These quantized models inherit the limitations of the original Tülu3 model:
+- Limited safety training compared to models with active filtering
+- Can produce problematic outputs, especially when prompted to do so
+- Unknown composition of the base Llama 3.1 training corpus
+- Additional considerations for quantized versions:
+  - Slight degradation in performance compared to full-precision model
+  - May show increased variance in mathematical reasoning tasks
+  - Q4_KM may exhibit more pronounced quality loss in complex scenarios
+## Recommended Use Cases
+- Research and development
+- Educational applications
+- Resource-constrained deployments
+- Edge computing scenarios
+- Prototyping and testing
+- Applications requiring faster inference
+## Acknowledgments
+These quantized models are based on the work of the Allen Institute for AI and the Llama 3.1 team. Special thanks to Georgi Gerganov and the entire llama.cpp development team for their outstanding contributions.
+## Contact
+For any inquiries or support, please contact us at [email protected] or visit our [Website](https://www.sandlogic.com/).