---
library_name: peft
datasets:
- databricks/databricks-dolly-15k
language:
- en
pipeline_tag: text-generation
---

# ctrltokyo/llama-2-7b-hf-dolly-flash-attention

This model is a fine-tuned version of [NousResearch/Llama-2-7b-hf](https://huggingface.co./NousResearch/Llama-2-7b-hf) on the databricks/databricks-dolly-15k dataset with all training performed using Flash Attention 2.

No further testing or optimisation has been performed.

## Model description

Just like [ctrltokyo/llm_prompt_mask_fill_model](https://huggingface.co./ctrltokyo/llm_prompt_mask_fill_model), this model could be used for live autocompletion of PROMPTS, but is more designed for a generalized chatbot (hence the usage of the Dolly 15k dataset). Don't try this on code, because it won't work.
I plan to release a further fine-tuned version using the [code_instructions_120k](https://huggingface.co./datasets/sahil2801/code_instructions_120k) dataset.

## Intended uses & limitations

Use as intended.

## Training and evaluation data

No evaluation was performed. Trained on NVIDIA A100, but appears to use around 20GB of VRAM when performing inference on the raw model.

## Training procedure

The following `bitsandbytes` quantization config was used during training:
- load_in_8bit: False
- load_in_4bit: True
- llm_int8_threshold: 6.0
- llm_int8_skip_modules: None
- llm_int8_enable_fp32_cpu_offload: False
- llm_int8_has_fp16_weight: False
- bnb_4bit_quant_type: fp4
- bnb_4bit_use_double_quant: False
- bnb_4bit_compute_dtype: float32
### Framework versions


- PEFT 0.4.0