nmitchko commited on
Commit
cc70246
1 Parent(s): 1908b21

Initial Upload

Browse files
Files changed (4) hide show
  1. README.md +153 -0
  2. adapter_config.json +22 -0
  3. adapter_model.bin +3 -0
  4. img.png +0 -0
README.md ADDED
@@ -0,0 +1,153 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ library_name: peft
5
+ pipeline_tag: text-generation
6
+ tags:
7
+ - medical
8
+ license: cc-by-nc-3.0
9
+ ---
10
+
11
+ # MedFalcon v2.1a 40b LoRA - Step 4500
12
+
13
+ ![img.png](img.png)
14
+
15
+ ## Model Description
16
+
17
+ This a model check point release at 4500 steps. For evaluation use only! Limitations:
18
+ * LoRA output will be more concise than the base model
19
+ * Due to the size, base knowledge may be overwritten from falcon-40b
20
+ * Due to the size, more hardware may be required to load falcon-40b when using this LoRA
21
+
22
+ ### Architecture
23
+ `nmitchko/medfalconv2-1a-40b-lora'` is a large language model LoRa specifically fine-tuned for medical domain tasks.
24
+ It is based on [`Falcon-40b`](https://huggingface.co/tiiuae/falcon-40b) at 40 billion parameters.
25
+
26
+ The primary goal of this model is to improve question-answering and medical dialogue tasks.
27
+ It was trained using [LoRA](https://arxiv.org/abs/2106.09685), specifically [QLora](https://github.com/artidoro/qlora), to reduce memory footprint.
28
+
29
+ See Training Parameters for more info This Lora supports 4-bit and 8-bit modes.
30
+
31
+ ### Requirements
32
+
33
+ ```
34
+ bitsandbytes>=0.39.0
35
+ peft
36
+ transformers
37
+ ```
38
+
39
+ Steps to load this model:
40
+ 1. Load base model using transformers
41
+ 2. Apply LoRA using peft
42
+
43
+ ```python
44
+ #
45
+ from transformers import AutoTokenizer, AutoModelForCausalLM
46
+ import transformers
47
+ import torch
48
+ from peft import PeftModel
49
+
50
+ model = "tiiuae/falcon-40b"
51
+ LoRA = "nmitchko/medfalconv2-1a-40b-lora"
52
+
53
+ # If you want 8 or 4 bit set the appropriate flags
54
+ load_8bit = True
55
+
56
+ tokenizer = AutoTokenizer.from_pretrained(model)
57
+
58
+ model = AutoModelForCausalLM.from_pretrained(model,
59
+ load_in_8bit=load_8bit,
60
+ torch_dtype=torch.float16,
61
+ trust_remote_code=True,
62
+ )
63
+
64
+ model = PeftModel.from_pretrained(model, LoRA)
65
+
66
+ pipeline = transformers.pipeline(
67
+ "text-generation",
68
+ model=model,
69
+ tokenizer=tokenizer,
70
+ torch_dtype=torch.bfloat16,
71
+ trust_remote_code=True,
72
+ device_map="auto",
73
+ )
74
+
75
+ sequences = pipeline(
76
+ "What does the drug ceftrioxone do?\nDoctor:",
77
+ max_length=200,
78
+ do_sample=True,
79
+ top_k=40,
80
+ num_return_sequences=1,
81
+ eos_token_id=tokenizer.eos_token_id,
82
+ )
83
+
84
+ for seq in sequences:
85
+ print(f"Result: {seq['generated_text']}")
86
+ ```
87
+
88
+ ## Training Parameters
89
+
90
+ The model was trained for 4500 steps or 1 epoch on a custom, unreleased dataset named `medconcat`.
91
+ `medconcat` contains only human generated content and weighs in at over 100MiB of raw text.
92
+
93
+ The below bash script initiated training in `4bit` mode for a rather large LoRA:
94
+
95
+ | Item | Amount | Units |
96
+ |---------------|--------|-------|
97
+ | LoRA Rank | 128 | ~ |
98
+ | LoRA Alpha | 256 | ~ |
99
+ | Learning Rate | 1e-3 | SI |
100
+ | Dropout | 5 | % |
101
+
102
+
103
+ ```bash
104
+ CURRENTDATEONLY=`date +"%b %d %Y"`
105
+
106
+ sudo nvidia-smi -i 1 -pl 250
107
+
108
+ export CUDA_VISIBLE_DEVICES=0
109
+
110
+ nohup python qlora.py \
111
+ --model_name_or_path models/tiiuae_falcon-40b \
112
+ --output_dir ./loras/medfalcon2.1a-40b \
113
+ --logging_steps 100 \
114
+ --save_strategy steps \
115
+ --data_seed 42 \
116
+ --save_steps 200 \
117
+ --save_total_limit 40 \
118
+ --evaluation_strategy steps \
119
+ --eval_dataset_size 1024 \
120
+ --max_eval_samples 1000 \
121
+ --per_device_eval_batch_size 1 \
122
+ --max_new_tokens 32 \
123
+ --dataloader_num_workers 3 \
124
+ --group_by_length \
125
+ --logging_strategy steps \
126
+ --remove_unused_columns False \
127
+ --do_train \
128
+ --lora_r 128 \
129
+ --lora_alpha 256 \
130
+ --lora_modules all \
131
+ --double_quant \
132
+ --quant_type nf4 \
133
+ --bf16 \
134
+ --bits 4 \
135
+ --warmup_ratio 0.03 \
136
+ --lr_scheduler_type constant \
137
+ --gradient_checkpointing \
138
+ --dataset="training/datasets/medconcat/" \
139
+ --dataset_format alpaca \
140
+ --trust_remote_code=True \
141
+ --source_max_len 16 \
142
+ --target_max_len 512 \
143
+ --per_device_train_batch_size 1 \
144
+ --gradient_accumulation_steps 16 \
145
+ --max_steps 4500 \
146
+ --eval_steps 1000 \
147
+ --learning_rate 0.0001 \
148
+ --adam_beta2 0.999 \
149
+ --max_grad_norm 0.3 \
150
+ --lora_dropout 0.05 \
151
+ --weight_decay 0.0 \
152
+ --seed 0 > "${CURRENTDATEONLY}-finetune-medfalcon2.1a.log" &
153
+ ```
adapter_config.json ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "base_model_name_or_path": "models/tiiuae_falcon-40b",
3
+ "bias": "none",
4
+ "fan_in_fan_out": false,
5
+ "inference_mode": true,
6
+ "init_lora_weights": true,
7
+ "layers_pattern": null,
8
+ "layers_to_transform": null,
9
+ "lora_alpha": 256.0,
10
+ "lora_dropout": 0.05,
11
+ "modules_to_save": null,
12
+ "peft_type": "LORA",
13
+ "r": 128,
14
+ "revision": null,
15
+ "target_modules": [
16
+ "dense",
17
+ "dense_h_to_4h",
18
+ "dense_4h_to_h",
19
+ "query_key_value"
20
+ ],
21
+ "task_type": "CAUSAL_LM"
22
+ }
adapter_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:53b6fb6a01432b56bb4d74e4eb593919a7950f6e28b633fc96c93c7b2c3ad0c0
3
+ size 1777513610
img.png ADDED