Any colab to reproduce the training
Hi friend,
I was wondering if you have any colab to reproduce your experiment of training bloom with msmarco.
It should be as easy as:
git clone https://github.com/Muennighoff/sgpt.git
pip install git+https://github.com/huggingface/accelerate
accelerate config
cd sgpt/biencoder/nli_msmarco
cd sentence-transformers; pip install -e .
cd sentence-transformers/sentence_transformers/losses/GradCache; pip install --editable .
pip install wandb
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch examples/training/ms_marco/train_bi-encoder_mnrl.py --model_name bigscience/bloom-7b1 --train_batch_size 32 --eval_batch_size 16 --freezenonbias --specb --lr 4e-4 --wandb --wandbwatchlog gradients --pooling weightedmean --gradcache --chunksize 8
How many gpu's have you used for this training?
Other thing. I have tested your already fine tuned bloom sgpt model for sentence embedding asymetric search but it is not so good for law domain. So i was thinking to make a new dataset like msmarco to get better results. Ehat do you think about that? Thank you in advance.
The number of GPUs are defined in accelerate config & via CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 - Here I'm using 8 (A100s with 80GB). You can use much less, but it will take longer. If you run out of memory decrease `--chunksize.
Yeah I think it could help, especially if you have negatives. I.e. for each sample you want both a sample that the embedding should be close to & one it should be far away from.
Thank you so much friend!
The number of GPUs are defined in accelerate config & via CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 - Here I'm using 8 (A100s with 80GB). You can use much less, but it will take longer. If you run out of memory decrease `--chunksize.
How much did it cost to run the training? And how long did it take?
Thanks
How much it costs depends on your cloud provider; In my case using those 8 A100s w/ 80GB it took like 5 hours