README.md · teleprint-me/mixtral-8x7b-instruct-v0.1 at 3cea8fcc72bf37116cffa0778808b8827f1f9bee

metadata

license: apache-2.0
license_link: https://github.com/mistralai/mistral-common/blob/main/LICENCE
library: llama.cpp
library_link: https://github.com/ggerganov/llama.cpp
base_model:
  - mistralai/Mixtral-8x7B-v0.1
language:
  - fr
  - it
  - de
  - es
  - en
pipeline_tag: text-generation
tags:
  - nlp
  - code
  - gguf
  - sparse
  - mixture-of-experts
  - code-generation

Mixtral 8x7B Instruct v0.1

Quantized Model Files

The Mixtral 8x7B Sparse Mixture of Experts (SMoE) model is available in two formats:

ggml-model-q4_0.gguf: 4-bit quantization for reduced memory and compute overhead.
ggml-model-q8_0.gguf: 8-bit quantization, offering balanced performance and precision.

These quantized formats ensure flexibility for deployment on various hardware configurations, from lightweight devices to large-scale inference servers.

Model Information

Mixtral 8x7B is a generative Sparse Mixture of Experts (SMoE) model designed to deliver high-quality outputs with significant computational efficiency. Leveraging a routing mechanism, it dynamically activates a subset of experts per input, reducing computational costs while maintaining the performance of a much larger model.

Key Features:

Architecture: Decoder-only SMoE with 46.7B total parameters but only 12.9B parameters active per token.
Context Window: Supports up to 32k tokens, making it suitable for long-context applications.
Multilingual Capabilities: Trained on French, Italian, German, Spanish, and English, making it robust for diverse linguistic tasks.
Performance: Matches or exceeds Llama 2 70B and GPT-3.5 across several industry-standard benchmarks.
Fine-Tuning Potential: Optimized for instruction-following use cases, with finetuning yielding strong improvements in dialogue and safety alignment.

Developer: Mistral AI
Training Data: Open web data, curated for quality and diverse representation.
Application Areas: Code generation, multilingual dialogue, and long-context processing.

Core Library

Mixtral 8x7B Instruct can be deployed using vLLM or transformers. Current support focuses on Hugging Face transformers for initial integrations.

Primary Framework: transformers
Alternate Framework: vLLM (for specialized inference optimizations)
Model Availability: Source weights and pre-converted formats are available under Apache 2.0.

Safety and Responsible Use

Mixtral 8x7B has been trained with an emphasis on ethical use and safety. It includes:

Guardrails for Sensitive Content: Optional system prompts to guide outputs.
Self-Reflection Prompting: Mechanism for internal assessment of generated outputs, allowing the model to classify its responses as suitable or unsuitable for deployment.

Developers should always consider additional tuning or filtering depending on their application and context.