--- inference: false language: - en license: other model_creator: Meta Llama 2 model_link: https://huggingface.co./meta-llama/Llama-2-13b-chat-hf model_name: Llama 2 13B Chat model_type: llama pipeline_tag: text-generation quantized_by: TheBloke tags: - facebook - meta - pytorch - llama - llama-2 --- # Llama 2 13B Chat - GPTQ - Model creator: [Meta Llama 2](https://huggingface.co./meta-llama) - Original model: [Llama 2 13B Chat](https://huggingface.co./meta-llama/Llama-2-13b-chat-hf) - Original GPTQ model Repo: [Llama-2-13B-chat-GPTQ](https://huggingface.co./TheBloke/Llama-2-13B-chat-GPTQ) ## Description This repo contains GPTQ model files for [Meta's Llama 2 13B-chat](https://huggingface.co./meta-llama/Llama-2-13b-chat-hf). Multiple GPTQ parameter permutations are provided; see Provided Files below for details of the options provided, their parameters, and the software used to create them. ## Prompt template: Llama-2-Chat ``` [INST] <> You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information. <> {prompt}[/INST] ``` ## How to use this GPTQ model from Python code ### Install the necessary packages Requires: Transformers 4.32.0 or later, Optimum 1.12.0 or later, and AutoGPTQ 0.4.2 or later. ```shell pip3 install transformers>=4.32.0 optimum>=1.12.0 pip3 install auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/ # Use cu117 if on CUDA 11.7 ``` If you have problems installing AutoGPTQ using the pre-built wheels, install it from source instead: ```shell pip3 uninstall -y auto-gptq git clone https://github.com/PanQiWei/AutoGPTQ cd AutoGPTQ pip3 install . ``` ### For CodeLlama models only: you must use Transformers 4.33.0 or later. If 4.33.0 is not yet released when you read this, you will need to install Transformers from source: ```shell pip3 uninstall -y transformers pip3 install git+https://github.com/huggingface/transformers.git ``` ### You can then use the following code ```python from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline model_name_or_path = "TheBloke/Llama-2-13B-chat-GPTQ" # To use a different branch, change revision # For example: revision="gptq-4bit-32g-actorder_True" model = AutoModelForCausalLM.from_pretrained(model_name_or_path, torch_dtype=torch.float16, device_map="auto", revision="main") tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True) prompt = "Tell me about AI" prompt_template=f'''[INST] <> You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information. <> {prompt}[/INST] ''' print("\n\n*** Generate:") input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda() output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512) print(tokenizer.decode(output[0])) # Inference can also be done using transformers' pipeline print("*** Pipeline:") pipe = pipeline( "text-generation", model=model, tokenizer=tokenizer, max_new_tokens=512, temperature=0.7, top_p=0.95, repetition_penalty=1.15 ) print(pipe(prompt_template)[0]['generated_text']) ``` ## Compatibility The files provided are tested to work with AutoGPTQ, both via Transformers and using AutoGPTQ directly. They should also work with [Occ4m's GPTQ-for-LLaMa fork](https://github.com/0cc4m/KoboldAI). [ExLlama](https://github.com/turboderp/exllama) is compatible with Llama models in 4-bit. Please see the Provided Files table above for per-file compatibility. [Huggingface Text Generation Inference (TGI)](https://github.com/huggingface/text-generation-inference) is compatible with all GPTQ models. ## Ethical Considerations and Limitations Llama 2 is a new technology that carries risks with use. Testing conducted to date has been in English, and has not covered, nor could it cover all scenarios. For these reasons, as with all LLMs, Llama 2’s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate, biased or other objectionable responses to user prompts. Therefore, before deploying any applications of Llama 2, developers should perform safety testing and tuning tailored to their specific applications of the model. Please see the Responsible Use Guide available at [https://ai.meta.com/llama/responsible-use-guide/](https://ai.meta.com/llama/responsible-use-guide) ## Reporting Issues Please report any software “bug,” or other problems with the models through one of the following means: - Reporting issues with the model: [github.com/facebookresearch/llama](http://github.com/facebookresearch/llama) - Reporting problematic content generated by the model: [developers.facebook.com/llama_output_feedback](http://developers.facebook.com/llama_output_feedback) - Reporting bugs and security concerns: [facebook.com/whitehat/info](http://facebook.com/whitehat/info) ## Llama Model Index |Model|Llama2|Llama2-hf|Llama2-chat|Llama2-chat-hf| |---|---|---|---|---| |7B| [Link](https://huggingface.co./llamaste/Llama-2-7b) | [Link](https://huggingface.co./llamaste/Llama-2-7b-hf) | [Link](https://huggingface.co./llamaste/Llama-2-7b-chat) | [Link](https://huggingface.co./llamaste/Llama-2-7b-chat-hf)| |13B| [Link](https://huggingface.co./llamaste/Llama-2-13b) | [Link](https://huggingface.co./llamaste/Llama-2-13b-hf) | [Link](https://huggingface.co./llamaste/Llama-2-13b-chat) | [Link](https://huggingface.co./llamaste/Llama-2-13b-hf)| |70B| [Link](https://huggingface.co./llamaste/Llama-2-70b) | [Link](https://huggingface.co./llamaste/Llama-2-70b-hf) | [Link](https://huggingface.co./llamaste/Llama-2-70b-chat) | [Link](https://huggingface.co./llamaste/Llama-2-70b-hf)|