--- license: other language: - en library_name: transformers pipeline_tag: text-generation tags: - llama - decapoda-research-13b-hf - prompt answering - peft --- ## Model Card for Model ID This repository contains a LLaMA-13B further fine-tuned model on conversations and question answering prompts. ⚠️ **I used [LLaMA-13B-hf](https://huggingface.co./decapoda-research/llama-13b-hf) as a base model, so this model is for Research purpose only (See the [license](https://huggingface.co./decapoda-research/llama-13b-hf/blob/main/LICENSE))** ## Model Details ### Model Description The decapoda-research/llama-13b-hf model was finetuned on conversations and question answering prompts. **Developed by:** [More Information Needed] **Shared by:** [More Information Needed] **Model type:** Causal LM **Language(s) (NLP):** English, multilingual **License:** Research **Finetuned from model:** decapoda-research/llama-13b-hf ## Model Sources [optional] **Repository:** [More Information Needed] **Paper:** [More Information Needed] **Demo:** [More Information Needed] ## Uses The model can be used for prompt answering ### Direct Use The model can be used for prompt answering ### Downstream Use Generating text and prompt answering ## Recommendations Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. # Usage ## Creating prompt The model was trained on the following kind of prompt: ```python def generate_prompt(instruction: str, input_ctxt: str = None) -> str: if input_ctxt: return f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: {instruction} ### Input: {input_ctxt} ### Response:""" else: return f"""Below is an instruction that describes a task. Write a response that appropriately completes the request. ### Instruction: {instruction} ### Response:""" ``` ## How to Get Started with the Model Use the code below to get started with the model. ```python import torch from transformers import GenerationConfig, LlamaTokenizer, LlamaForCausalLM tokenizer = LlamaTokenizer.from_pretrained("chainyo/alpaca-lora-7b") model = LlamaForCausalLM.from_pretrained( "chainyo/alpaca-lora-7b", load_in_8bit=True, torch_dtype=torch.float16, device_map="auto", ) generation_config = GenerationConfig( temperature=0.2, top_p=0.75, top_k=40, num_beams=4, max_new_tokens=128, ) model.eval() if torch.__version__ >= "2": model = torch.compile(model) ``` ### Example of Usage ```python instruction = "What is the capital city of Greece and with which countries does Greece border?" input_ctxt = None # For some tasks, you can provide an input context to help the model generate a better response. prompt = generate_prompt(instruction, input_ctxt) input_ids = tokenizer(prompt, return_tensors="pt").input_ids input_ids = input_ids.to(model.device) with torch.no_grad(): outputs = model.generate( input_ids=input_ids, generation_config=generation_config, return_dict_in_generate=True, output_scores=True, ) response = tokenizer.decode(outputs.sequences[0], skip_special_tokens=True) print(response) >>> The capital city of Greece is Athens and it borders Albania, Macedonia, Bulgaria and Turkey. ``` ## Training Details ### Training Data The decapoda-research/llama-13b-hf was finetuned on conversations and question answering data ### Training Procedure The decapoda-research/llama-13b-hf model was further trained and finetuned on question answering and prompts data for 1 epoch (approximately 10 hours of training on a single GPU) ## Model Architecture and Objective The model is based on decapoda-research/llama-13b-hf model and finetuned adapters on top of the main model on conversations and question answering data.