jondurbin
/

bagel-7b-v0.1

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

jondurbin commited on Dec 13, 2023

Commit

6bcabff

·

1 Parent(s): 02ba36c

Update README.md

Files changed (1) hide show

README.md +5 -5

README.md CHANGED Viewed

@@ -150,7 +150,9 @@ If you *really* want to use `<|im_start|>` and `<|im_end|>`, just update your `t
 {instruction} [/INST]
 ```
-## Fine tune
 ```bash
 export BASE_DIR=/workspace
@@ -158,7 +160,7 @@ export WANDB_API_KEY=[redacted]
 export WANDB_PROJECT=bagel-7b-v0.1
 # Run the pretraining.
-accelerate launch -m bagel.tune.sft \
   --model_name_or_path $BASE_DIR/mistral-7b \
   --final_output_dir $BASE_DIR/$WANDB_PROJECT \
   --output_dir $BASE_DIR/$WANDB_PROJECT-workdir \
@@ -219,6 +221,4 @@ Deepspeed configuration:
     "allgather_bucket_size": 5e8
   }
 }
-```
-This was done in runpod on an 8x 80gb a100 instance.  I actually stopped the fine tune at around 50% due to budget constraints.

 {instruction} [/INST]
 ```
+### Fine-tune
+*Note: I actually used my fork of [qlora](https://github.com/jondurbin/qlora)'s `train.py` for this, but I'm porting it to a minified version here, not tested yet!*
 ```bash
 export BASE_DIR=/workspace
 export WANDB_PROJECT=bagel-7b-v0.1
 # Run the pretraining.
+accelerate launch bagel/tune/sft.py \
   --model_name_or_path $BASE_DIR/mistral-7b \
   --final_output_dir $BASE_DIR/$WANDB_PROJECT \
   --output_dir $BASE_DIR/$WANDB_PROJECT-workdir \
     "allgather_bucket_size": 5e8
   }
 }
+```