--- library_name: transformers tags: - 4bit - bnb - bitsandbytes - llama - llama-2 - facebook - meta - 7b - quantized license: llama2 pipeline_tag: text-generation --- # Model Card for alokabhishek/Llama-2-7b-chat-hf-bnb-4bit This repo contains 4-bit quantized (using bitsandbytes) model of Meta's meta-llama/Llama-2-7b-chat-hf ## Model Details - Model creator: [Meta](https://huggingface.co./meta-llama) - Original model: [Llama-2-7b-chat-hf](https://huggingface.co./meta-llama/Llama-2-7b-chat-hf) ### About 4 bit quantization using bitsandbytes QLoRA: Efficient Finetuning of Quantized LLMs: [arXiv - QLoRA: Efficient Finetuning of Quantized LLMs](https://arxiv.org/abs/2305.14314) Hugging Face Blog post on 4-bit quantization using bitsandbytes: [Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA](https://huggingface.co./blog/4bit-transformers-bitsandbytes) bitsandbytes github repo: [bitsandbytes github repo](https://github.com/TimDettmers/bitsandbytes) # How to Get Started with the Model Use the code below to get started with the model. ## How to run from Python code #### First install the package ```shell pip install -q -U bitsandbytes accelerate torch huggingface_hub pip install -q -U git+https://github.com/huggingface/transformers.git # Install latest version of transformers pip install -q -U git+https://github.com/huggingface/peft.git pip install flash-attn --no-build-isolation ``` #### Import ```python import torch import os from torch import bfloat16 from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, BitsAndBytesConfig, LlamaForCausalLM ``` #### Use a pipeline as a high-level helper ```python model_id_llama = "alokabhishek/Llama-2-7b-chat-hf-bnb-4bit" tokenizer_llama = AutoTokenizer.from_pretrained(model_id_llama, use_fast=True) model_llama = AutoModelForCausalLM.from_pretrained( model_id_llama, device_map="auto" ) pipe_llama = pipeline(model=model_llama, tokenizer=tokenizer_llama, task='text-generation') prompt_llama = "Tell me a funny joke about Large Language Models meeting a Blackhole in an intergalactic Bar." output_llama = pipe_llama(prompt_llama, max_new_tokens=512) print(output_llama[0]["generated_text"]) ``` ## Uses ### Direct Use [More Information Needed] ### Downstream Use [optional] [More Information Needed] ### Out-of-Scope Use [More Information Needed] ## Bias, Risks, and Limitations [More Information Needed] ## Model Card Authors [optional] [More Information Needed] ## Model Card Contact [More Information Needed]