---
library_name: transformers
tags:
- 4bit
- bnb
- bitsandbytes
- llama
- llama-2
- facebook
- meta
- 7b
- quantized
license: llama2
pipeline_tag: text-generation
---

# Model Card for alokabhishek/Llama-2-7b-chat-hf-bnb-4bit

<!-- Provide a quick summary of what the model is/does. -->

This repo contains 4-bit quantized (using bitsandbytes) model of Meta's meta-llama/Llama-2-7b-chat-hf

## Model Details

- Model creator: [Meta](https://huggingface.co./meta-llama)
- Original model: [Llama-2-7b-chat-hf](https://huggingface.co./meta-llama/Llama-2-7b-chat-hf)


### About 4 bit quantization using bitsandbytes


QLoRA: Efficient Finetuning of Quantized LLMs: [arXiv - QLoRA: Efficient Finetuning of Quantized LLMs](https://arxiv.org/abs/2305.14314)

Hugging Face Blog post on 4-bit quantization using bitsandbytes: [Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA](https://huggingface.co./blog/4bit-transformers-bitsandbytes)

bitsandbytes github repo: [bitsandbytes github repo](https://github.com/TimDettmers/bitsandbytes)

# How to Get Started with the Model

Use the code below to get started with the model.

## How to run from Python code

#### First install the package
```shell
pip install -q -U bitsandbytes accelerate torch huggingface_hub
pip install -q -U git+https://github.com/huggingface/transformers.git # Install latest version of transformers
pip install -q -U git+https://github.com/huggingface/peft.git
pip install flash-attn --no-build-isolation
```

#### Import 

```python
import torch
import os
from torch import bfloat16
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, BitsAndBytesConfig, LlamaForCausalLM
```

#### Use a pipeline as a high-level helper

```python
model_id_llama = "alokabhishek/Llama-2-7b-chat-hf-bnb-4bit"

tokenizer_llama = AutoTokenizer.from_pretrained(model_id_llama, use_fast=True)

model_llama = AutoModelForCausalLM.from_pretrained(
    model_id_llama,
    device_map="auto"
)


pipe_llama = pipeline(model=model_llama, tokenizer=tokenizer_llama, task='text-generation')

prompt_llama = "Tell me a funny joke about Large Language Models meeting a Blackhole in an intergalactic Bar."

output_llama = pipe_llama(prompt_llama, max_new_tokens=512)

print(output_llama[0]["generated_text"])

```


## Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

### Direct Use

<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->

[More Information Needed]

### Downstream Use [optional]

<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->

[More Information Needed]

### Out-of-Scope Use

<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->

[More Information Needed]

## Bias, Risks, and Limitations

<!-- This section is meant to convey both technical and sociotechnical limitations. -->

[More Information Needed]


## Model Card Authors [optional]

[More Information Needed]

## Model Card Contact

[More Information Needed]