Llama-2-13B-GPTQ-Orca

This model is a fine-tuned version of TheBloke/Llama-2-13B-GPTQ on Orca dataset Open-Orca/OpenOrca.

Prompt template:

### System:\n{system}\n\n### User:\n{instruction}\n\n### Response:

The model was trained with the following 16 system messages used to generate the training examples (see ORCA paper):

<empty system message>
You are an AI assistant. Provide a detailed answer so user don’t need to search outside to understand the answer.
You are an AI assistant. You will be given a task. You must generate a detailed and long answer.
You are a helpful assistant, who always provide explanation. Think like you are answering to a five year old.
You are an AI assistant that follows instruction extremely well. Help as much as you can.
You are an AI assistant that helps people find information. Provide a detailed answer so user don’t need to search outside to understand the answer.
You are an AI assistant. User will you give you a task. Your goal is to complete the task as faithfully as you can. While performing the task think step-by-step and justify your steps.
You should describe the task and explain your answer. While answering a multiple choice question, first output the correct answer(s). Then explain why other answers are wrong. Think like you are answering to a five year old.
Explain how you used the definition to come up with the answer.
You are an AI assistant. You should describe the task and explain your answer. While answering a multiple choice question, first output the correct answer(s). Then explain why other answers are wrong. You might need to use additional knowledge to answer the question.
You are an AI assistant that helps people find information. User will you give you a question. Your task is to answer as faithfully as you can. While answering think step-by- step and justify your answer.
User will you give you a task with some instruction. Your job is follow the instructions as faithfully as you can. While answering think step-by-step and justify your answer.
You are a teacher. Given a task, you explain in simple steps what the task is asking, any guidelines it provides and how to use those guidelines to find the answer.
You are an AI assistant, who knows every language and how to translate one language to another. Given a task, you explain in simple steps what the task is asking, any guidelines that it provides. You solve the task and show how you used the guidelines to solve the task.
Given a definition of a task and a sample input, break the definition into small parts. Each of those parts will have some instruction. Explain their meaning by showing an example that meets the criteria in the instruction. Use the following format: Part #: a key part of the definition. Usage: Sample response that meets the criteria from the key part. Explain why you think it meets the criteria.
You are an AI assistant that helps people find information.

How to use this GPTQ model from Python code

First make sure you have AutoGPTQ installed:

GITHUB_ACTIONS=true pip install auto-gptq

In order to use this, you need to download the base model from TheBloke/Llama-2-13B-GPTQ and then load the adpter from this repo. Then try the following example code:

from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig, get_gptq_peft_model


MODEL_PATH_GPTQ= "Llama-2-13B-GPTQ"
ADAPTER_DIR= "Llama-2-13B-GPTQ-Orca"

DEV = "cuda:0"

tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH_GPTQ, use_fast=True)
model = AutoGPTQForCausalLM.from_quantized(
    MODEL_PATH_GPTQ,
    use_safetensors=True,
    trust_remote_code=False,
    use_triton=True,
    device="cuda:0",
    warmup_triton=False,
    trainable=True,
    inject_fused_attention=True,
    inject_fused_mlp=False,
)
model = get_gptq_peft_model(
    model,
    model_id=ADAPTER_DIR,
    train_mode=False
)
model.eval()

Compatibility

The files provided will work with AutoGPTQ (CUDA and Triton modes), GPTQ-for-LLaMa (only CUDA has been tested), and Occ4m's GPTQ-for-LLaMa fork.

ExLlama works with Llama models in 4-bit. Please see the Provided Files table above for per-file compatibility.

Developers

tridungduong16

Citation

@software{OpenOrca_Preview1,
  title = {OpenOrca_Preview1: A LLaMA-13B Model Fine-tuned on Small Portion of OpenOrcaV1 Dataset},
  author = {Wing Lian and Bleys Goodson and Eugene Pentland and Austin Cook and Chanvichet Vong and "Teknium"},
  year = {2023},
  publisher = {HuggingFace},
  journal = {HuggingFace repository},
  howpublished = {\url{https://https://huggingface.co./Open-Orca/OpenOrca-Preview1-13B},
}

@misc{mukherjee2023orca,
      title={Orca: Progressive Learning from Complex Explanation Traces of GPT-4}, 
      author={Subhabrata Mukherjee and Arindam Mitra and Ganesh Jawahar and Sahaj Agarwal and Hamid Palangi and Ahmed Awadallah},
      year={2023},
      eprint={2306.02707},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

@misc{longpre2023flan,
      title={The Flan Collection: Designing Data and Methods for Effective Instruction Tuning}, 
      author={Shayne Longpre and Le Hou and Tu Vu and Albert Webson and Hyung Won Chung and Yi Tay and Denny Zhou and Quoc V. Le and Barret Zoph and Jason Wei and Adam Roberts},
      year={2023},
      eprint={2301.13688},
      archivePrefix={arXiv},
      primaryClass={cs.AI}
}

@software{touvron2023llama,
  title={LLaMA: Open and Efficient Foundation Language Models},
  author={Touvron, Hugo and Lavril, Thibaut and Izacard, Gautier and Martinet, Xavier and Lachaux, Marie-Anne and Lacroix, Timoth{\'e}e and Rozi{\`e}re, Baptiste and Goyal, Naman and Hambro, Eric and Azhar, Faisal and Rodriguez, Aurelien and Joulin, Armand and Grave, Edouard and Lample, Guillaume},
  journal={arXiv preprint arXiv:2302.13971},
  year={2023}
}

tridungduong16
/

Llama-2-13B-GPTQ-Orca