--- library_name: transformers base_model: meta-llama/Meta-Llama-3.1-70B-Instruct license: llama3.1 model-index: - name: Meta-Llama-3.1-70B-Instruct-INT8 results: [] language: - en - de - fr - it - pt - hi - es - th tags: - facebook - meta - pytorch - llama - llama-3 --- # Model Card for Model ID This is a quantized version of `Llama 3.1 70B Instruct`. Quantized to **8-bit** using `bistandbytes` and `accelerate`. - **Developed by:** Farid Saud @ DSRS - **License:** llama3.1 - **Base Model:** meta-llama/Meta-Llama-3.1-70B-Instruct ## Use this model Use a pipeline as a high-level helper: ```python # Use a pipeline as a high-level helper from transformers import pipeline messages = [ {"role": "user", "content": "Who are you?"}, ] pipe = pipeline("text-generation", model="fsaudm/Meta-Llama-3.1-70B-Instruct-INT8") pipe(messages) ``` Load model directly ```python # Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("fsaudm/Meta-Llama-3.1-70B-Instruct-INT8") model = AutoModelForCausalLM.from_pretrained("fsaudm/Meta-Llama-3.1-70B-Instruct-INT8") ``` The base model information can be found in the original [meta-llama/Meta-Llama-3.1-70B-Instruct](https://huggingface.co./meta-llama/Meta-Llama-3.1-70B-Instruct)