--- library_name: peft datasets: - databricks/databricks-dolly-15k language: - en pipeline_tag: text-generation --- # ctrltokyo/llama-2-7b-hf-dolly-flash-attention This model is a fine-tuned version of [NousResearch/Llama-2-7b-hf](https://huggingface.co./NousResearch/Llama-2-7b-hf) on the databricks/databricks-dolly-15k dataset with all training performed using Flash Attention 2. No further testing or optimisation has been performed. ## Model description Just like [ctrltokyo/llm_prompt_mask_fill_model](https://huggingface.co./ctrltokyo/llm_prompt_mask_fill_model), this model could be used for live autocompletion of PROMPTS, but is more designed for a generalized chatbot (hence the usage of the Dolly 15k dataset). Don't try this on code, because it won't work. I plan to release a further fine-tuned version using the [code_instructions_120k](https://huggingface.co./datasets/sahil2801/code_instructions_120k) dataset. ## Intended uses & limitations Use as intended. ## Training and evaluation data No evaluation was performed. Trained on NVIDIA A100, but appears to use around 20GB of VRAM when performing inference on the raw model. ## Training procedure The following `bitsandbytes` quantization config was used during training: - load_in_8bit: False - load_in_4bit: True - llm_int8_threshold: 6.0 - llm_int8_skip_modules: None - llm_int8_enable_fp32_cpu_offload: False - llm_int8_has_fp16_weight: False - bnb_4bit_quant_type: fp4 - bnb_4bit_use_double_quant: False - bnb_4bit_compute_dtype: float32 ### Framework versions - PEFT 0.4.0