|
--- |
|
inference: false |
|
language: |
|
- en |
|
license: other |
|
model_creator: Meta Llama 2 |
|
model_link: https://huggingface.co./meta-llama/Llama-2-13b-chat-hf |
|
model_name: Llama 2 13B Chat |
|
model_type: llama |
|
pipeline_tag: text-generation |
|
quantized_by: TheBloke |
|
tags: |
|
- facebook |
|
- meta |
|
- pytorch |
|
- llama |
|
- llama-2 |
|
--- |
|
|
|
# Llama 2 13B Chat - GPTQ |
|
- Model creator: [Meta Llama 2](https://huggingface.co./meta-llama) |
|
- Original model: [Llama 2 13B Chat](https://huggingface.co./meta-llama/Llama-2-13b-chat-hf) |
|
- Original GPTQ model Repo: [Llama-2-13B-chat-GPTQ](https://huggingface.co./TheBloke/Llama-2-13B-chat-GPTQ) |
|
|
|
<!-- description start --> |
|
## Description |
|
|
|
This repo contains GPTQ model files for [Meta's Llama 2 13B-chat](https://huggingface.co./meta-llama/Llama-2-13b-chat-hf). |
|
|
|
Multiple GPTQ parameter permutations are provided; see Provided Files below for details of the options provided, their parameters, and the software used to create them. |
|
|
|
<!-- prompt-template start --> |
|
## Prompt template: Llama-2-Chat |
|
|
|
``` |
|
[INST] <<SYS>> |
|
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information. |
|
<</SYS>> |
|
{prompt}[/INST] |
|
|
|
``` |
|
|
|
<!-- prompt-template end --> |
|
|
|
|
|
<!-- README_GPTQ.md-use-from-python start --> |
|
## How to use this GPTQ model from Python code |
|
|
|
### Install the necessary packages |
|
|
|
Requires: Transformers 4.32.0 or later, Optimum 1.12.0 or later, and AutoGPTQ 0.4.2 or later. |
|
|
|
```shell |
|
pip3 install transformers>=4.32.0 optimum>=1.12.0 |
|
pip3 install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/ # Use cu117 if on CUDA 11.7 |
|
``` |
|
|
|
If you have problems installing AutoGPTQ using the pre-built wheels, install it from source instead: |
|
|
|
```shell |
|
pip3 uninstall -y auto-gptq |
|
git clone https://github.com/PanQiWei/AutoGPTQ |
|
cd AutoGPTQ |
|
pip3 install . |
|
``` |
|
|
|
### For CodeLlama models only: you must use Transformers 4.33.0 or later. |
|
|
|
If 4.33.0 is not yet released when you read this, you will need to install Transformers from source: |
|
```shell |
|
pip3 uninstall -y transformers |
|
pip3 install git+https://github.com/huggingface/transformers.git |
|
``` |
|
|
|
### You can then use the following code |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline |
|
|
|
model_name_or_path = "TheBloke/Llama-2-13B-chat-GPTQ" |
|
# To use a different branch, change revision |
|
# For example: revision="gptq-4bit-32g-actorder_True" |
|
model = AutoModelForCausalLM.from_pretrained(model_name_or_path, |
|
torch_dtype=torch.float16, |
|
device_map="auto", |
|
revision="main") |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True) |
|
|
|
prompt = "Tell me about AI" |
|
prompt_template=f'''[INST] <<SYS>> |
|
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information. |
|
<</SYS>> |
|
{prompt}[/INST] |
|
|
|
''' |
|
|
|
print("\n\n*** Generate:") |
|
|
|
input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda() |
|
output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512) |
|
print(tokenizer.decode(output[0])) |
|
|
|
# Inference can also be done using transformers' pipeline |
|
|
|
print("*** Pipeline:") |
|
pipe = pipeline( |
|
"text-generation", |
|
model=model, |
|
tokenizer=tokenizer, |
|
max_new_tokens=512, |
|
temperature=0.7, |
|
top_p=0.95, |
|
repetition_penalty=1.15 |
|
) |
|
|
|
print(pipe(prompt_template)[0]['generated_text']) |
|
``` |
|
<!-- README_GPTQ.md-use-from-python end --> |
|
|
|
<!-- README_GPTQ.md-compatibility start --> |
|
## Compatibility |
|
|
|
The files provided are tested to work with AutoGPTQ, both via Transformers and using AutoGPTQ directly. They should also work with [Occ4m's GPTQ-for-LLaMa fork](https://github.com/0cc4m/KoboldAI). |
|
|
|
[ExLlama](https://github.com/turboderp/exllama) is compatible with Llama models in 4-bit. Please see the Provided Files table above for per-file compatibility. |
|
|
|
[Huggingface Text Generation Inference (TGI)](https://github.com/huggingface/text-generation-inference) is compatible with all GPTQ models. |
|
<!-- README_GPTQ.md-compatibility end --> |
|
|
|
## Ethical Considerations and Limitations |
|
Llama 2 is a new technology that carries risks with use. Testing conducted to date has been in English, and has not covered, nor could it cover all scenarios. For these reasons, as with all LLMs, Llama 2’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. Therefore, before deploying any applications of Llama 2, developers should perform safety testing and tuning tailored to their specific applications of the model. |
|
|
|
Please see the Responsible Use Guide available at [https://ai.meta.com/llama/responsible-use-guide/](https://ai.meta.com/llama/responsible-use-guide) |
|
|
|
## Reporting Issues |
|
Please report any software “bug,” or other problems with the models through one of the following means: |
|
- Reporting issues with the model: [github.com/facebookresearch/llama](http://github.com/facebookresearch/llama) |
|
- Reporting problematic content generated by the model: [developers.facebook.com/llama_output_feedback](http://developers.facebook.com/llama_output_feedback) |
|
- Reporting bugs and security concerns: [facebook.com/whitehat/info](http://facebook.com/whitehat/info) |
|
|
|
## Llama Model Index |
|
|Model|Llama2|Llama2-hf|Llama2-chat|Llama2-chat-hf| |
|
|---|---|---|---|---| |
|
|7B| [Link](https://huggingface.co./llamaste/Llama-2-7b) | [Link](https://huggingface.co./llamaste/Llama-2-7b-hf) | [Link](https://huggingface.co./llamaste/Llama-2-7b-chat) | [Link](https://huggingface.co./llamaste/Llama-2-7b-chat-hf)| |
|
|13B| [Link](https://huggingface.co./llamaste/Llama-2-13b) | [Link](https://huggingface.co./llamaste/Llama-2-13b-hf) | [Link](https://huggingface.co./llamaste/Llama-2-13b-chat) | [Link](https://huggingface.co./llamaste/Llama-2-13b-hf)| |
|
|70B| [Link](https://huggingface.co./llamaste/Llama-2-70b) | [Link](https://huggingface.co./llamaste/Llama-2-70b-hf) | [Link](https://huggingface.co./llamaste/Llama-2-70b-chat) | [Link](https://huggingface.co./llamaste/Llama-2-70b-hf)| |
|
|