aberrio's picture
Create README.md
3cea8fc verified
|
raw
history blame
2.97 kB
---
license: apache-2.0
license_link: https://github.com/mistralai/mistral-common/blob/main/LICENCE
library: llama.cpp
library_link: https://github.com/ggerganov/llama.cpp
base_model:
- mistralai/Mixtral-8x7B-v0.1
language:
- fr
- it
- de
- es
- en
pipeline_tag: text-generation
tags:
- nlp
- code
- gguf
- sparse
- mixture-of-experts
- code-generation
---
## Mixtral 8x7B Instruct v0.1
### Quantized Model Files
The Mixtral 8x7B Sparse Mixture of Experts (SMoE) model is available in two formats:
- **ggml-model-q4_0.gguf**: 4-bit quantization for reduced memory and compute overhead.
- **ggml-model-q8_0.gguf**: 8-bit quantization, offering balanced performance and precision.
These quantized formats ensure flexibility for deployment on various hardware configurations, from lightweight devices to large-scale inference servers.
### Model Information
Mixtral 8x7B is a generative Sparse Mixture of Experts (SMoE) model designed to deliver high-quality outputs with significant computational efficiency. Leveraging a routing mechanism, it dynamically activates a subset of experts per input, reducing computational costs while maintaining the performance of a much larger model.
**Key Features:**
- **Architecture:** Decoder-only SMoE with 46.7B total parameters but only 12.9B parameters active per token.
- **Context Window:** Supports up to 32k tokens, making it suitable for long-context applications.
- **Multilingual Capabilities:** Trained on French, Italian, German, Spanish, and English, making it robust for diverse linguistic tasks.
- **Performance:** Matches or exceeds Llama 2 70B and GPT-3.5 across several industry-standard benchmarks.
- **Fine-Tuning Potential:** Optimized for instruction-following use cases, with finetuning yielding strong improvements in dialogue and safety alignment.
**Developer**: Mistral AI
**Training Data**: Open web data, curated for quality and diverse representation.
**Application Areas**: Code generation, multilingual dialogue, and long-context processing.
### Core Library
Mixtral 8x7B Instruct can be deployed using `vLLM` or `transformers`. Current support focuses on Hugging Face `transformers` for initial integrations.
**Primary Framework**: `transformers`
**Alternate Framework**: `vLLM` (for specialized inference optimizations)
**Model Availability**: Source weights and pre-converted formats are available under [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0).
### Safety and Responsible Use
Mixtral 8x7B has been trained with an emphasis on ethical use and safety. It includes:
1. **Guardrails for Sensitive Content**: Optional system prompts to guide outputs.
2. **Self-Reflection Prompting**: Mechanism for internal assessment of generated outputs, allowing the model to classify its responses as suitable or unsuitable for deployment.
Developers should always consider additional tuning or filtering depending on their application and context.