Update README.md
Browse files
README.md
CHANGED
@@ -150,7 +150,9 @@ If you *really* want to use `<|im_start|>` and `<|im_end|>`, just update your `t
|
|
150 |
{instruction} [/INST]
|
151 |
```
|
152 |
|
153 |
-
|
|
|
|
|
154 |
|
155 |
```bash
|
156 |
export BASE_DIR=/workspace
|
@@ -158,7 +160,7 @@ export WANDB_API_KEY=[redacted]
|
|
158 |
export WANDB_PROJECT=bagel-7b-v0.1
|
159 |
|
160 |
# Run the pretraining.
|
161 |
-
accelerate launch
|
162 |
--model_name_or_path $BASE_DIR/mistral-7b \
|
163 |
--final_output_dir $BASE_DIR/$WANDB_PROJECT \
|
164 |
--output_dir $BASE_DIR/$WANDB_PROJECT-workdir \
|
@@ -219,6 +221,4 @@ Deepspeed configuration:
|
|
219 |
"allgather_bucket_size": 5e8
|
220 |
}
|
221 |
}
|
222 |
-
```
|
223 |
-
|
224 |
-
This was done in runpod on an 8x 80gb a100 instance. I actually stopped the fine tune at around 50% due to budget constraints.
|
|
|
150 |
{instruction} [/INST]
|
151 |
```
|
152 |
|
153 |
+
### Fine-tune
|
154 |
+
|
155 |
+
*Note: I actually used my fork of [qlora](https://github.com/jondurbin/qlora)'s `train.py` for this, but I'm porting it to a minified version here, not tested yet!*
|
156 |
|
157 |
```bash
|
158 |
export BASE_DIR=/workspace
|
|
|
160 |
export WANDB_PROJECT=bagel-7b-v0.1
|
161 |
|
162 |
# Run the pretraining.
|
163 |
+
accelerate launch bagel/tune/sft.py \
|
164 |
--model_name_or_path $BASE_DIR/mistral-7b \
|
165 |
--final_output_dir $BASE_DIR/$WANDB_PROJECT \
|
166 |
--output_dir $BASE_DIR/$WANDB_PROJECT-workdir \
|
|
|
221 |
"allgather_bucket_size": 5e8
|
222 |
}
|
223 |
}
|
224 |
+
```
|
|
|
|