Update README.md
Browse files
README.md
CHANGED
@@ -7,8 +7,8 @@ pipeline_tag: text-generation
|
|
7 |
# Falcon-40b-chat-oasst1
|
8 |
|
9 |
Falcon-40b-chat-oasst1 is a chatbot-like model for dialogue generation. It was built by fine-tuning [Falcon-40B](https://huggingface.co/tiiuae/falcon-40b) on the [OpenAssistant/oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1) dataset.
|
10 |
-
|
11 |
-
- The training relied on a
|
12 |
- Training took approximately 10 hours and was executed on a workstation with a single A100-SXM NVIDIA GPU with 37 GB of available memory (via Google Colab).
|
13 |
- See attached [Notebook](https://huggingface.co/dfurman/falcon-40b-chat-oasst1/blob/main/finetune_falcon40b_oasst1_with_bnb_peft.ipynb) for the code (and hyperparams) used to train the model.
|
14 |
|
@@ -94,15 +94,11 @@ We recommend users of this model to develop guardrails and to take appropriate p
|
|
94 |
|
95 |
### Setup
|
96 |
```python
|
97 |
-
# Install
|
98 |
!pip install -q -U bitsandbytes loralib einops
|
99 |
!pip install -q -U git+https://github.com/huggingface/transformers.git
|
100 |
!pip install -q -U git+https://github.com/huggingface/peft.git
|
101 |
!pip install -q -U git+https://github.com/huggingface/accelerate.git
|
102 |
-
|
103 |
-
import torch
|
104 |
-
from peft import PeftModel, PeftConfig
|
105 |
-
from transformers import AutoModelForCausalLM, AutoTokenizer
|
106 |
```
|
107 |
|
108 |
### GPU Inference in 4-bit
|
@@ -110,6 +106,10 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
110 |
This requires a GPU with at least 27GB memory.
|
111 |
|
112 |
```python
|
|
|
|
|
|
|
|
|
113 |
# load the model
|
114 |
peft_model_id = "dfurman/falcon-40b-chat-oasst1"
|
115 |
config = PeftConfig.from_pretrained(peft_model_id)
|
@@ -133,9 +133,7 @@ tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
|
|
133 |
tokenizer.pad_token = tokenizer.eos_token
|
134 |
|
135 |
model = PeftModel.from_pretrained(model, peft_model_id)
|
136 |
-
```
|
137 |
|
138 |
-
```python
|
139 |
# run the model
|
140 |
prompt = """<human>: My name is Daniel. Write a short email to my closest friends inviting them to come to my home on Friday for a dinner party, I will make the food but tell them to BYOB.
|
141 |
<bot>:"""
|
|
|
7 |
# Falcon-40b-chat-oasst1
|
8 |
|
9 |
Falcon-40b-chat-oasst1 is a chatbot-like model for dialogue generation. It was built by fine-tuning [Falcon-40B](https://huggingface.co/tiiuae/falcon-40b) on the [OpenAssistant/oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1) dataset.
|
10 |
+
- The model was fine-tuned in 4-bit precision using `peft`, `transformers`, and `bitsandbytes`.
|
11 |
+
- The training relied on a method called "Low Rank Adapters" ([LoRA](https://arxiv.org/pdf/2106.09685.pdf)), specifically the [QLoRA](https://arxiv.org/abs/2305.14314) variant. Instead of fine-tuning the entire model you fine-tune lightweight adapters and load them inside the base model at inference.
|
12 |
- Training took approximately 10 hours and was executed on a workstation with a single A100-SXM NVIDIA GPU with 37 GB of available memory (via Google Colab).
|
13 |
- See attached [Notebook](https://huggingface.co/dfurman/falcon-40b-chat-oasst1/blob/main/finetune_falcon40b_oasst1_with_bnb_peft.ipynb) for the code (and hyperparams) used to train the model.
|
14 |
|
|
|
94 |
|
95 |
### Setup
|
96 |
```python
|
97 |
+
# Install packages
|
98 |
!pip install -q -U bitsandbytes loralib einops
|
99 |
!pip install -q -U git+https://github.com/huggingface/transformers.git
|
100 |
!pip install -q -U git+https://github.com/huggingface/peft.git
|
101 |
!pip install -q -U git+https://github.com/huggingface/accelerate.git
|
|
|
|
|
|
|
|
|
102 |
```
|
103 |
|
104 |
### GPU Inference in 4-bit
|
|
|
106 |
This requires a GPU with at least 27GB memory.
|
107 |
|
108 |
```python
|
109 |
+
import torch
|
110 |
+
from peft import PeftModel, PeftConfig
|
111 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
112 |
+
|
113 |
# load the model
|
114 |
peft_model_id = "dfurman/falcon-40b-chat-oasst1"
|
115 |
config = PeftConfig.from_pretrained(peft_model_id)
|
|
|
133 |
tokenizer.pad_token = tokenizer.eos_token
|
134 |
|
135 |
model = PeftModel.from_pretrained(model, peft_model_id)
|
|
|
136 |
|
|
|
137 |
# run the model
|
138 |
prompt = """<human>: My name is Daniel. Write a short email to my closest friends inviting them to come to my home on Friday for a dinner party, I will make the food but tell them to BYOB.
|
139 |
<bot>:"""
|