SebastianBodza
commited on
Commit
•
a7a4a14
1
Parent(s):
1967b9d
Update README.md
Browse files
README.md
CHANGED
@@ -51,6 +51,8 @@ txt = model.generate(**txt,
|
|
51 |
eos_token_id=tokenizer.eos_token_id)
|
52 |
tokenizer.decode(txt[0], skip_special_tokens=True)
|
53 |
```
|
|
|
|
|
54 |
## Training:
|
55 |
Training was based on Llama-X with the adaptions of WizardLMs training script and additional adjustments to QLoRa tune. MPT-Code from <a href="https://huggingface.co/SebastianBodza/mpt-30B-qlora-multi_GPU">SebastianBodza/mpt-30B-qlora-multi_GPU</a>
|
56 |
|
|
|
51 |
eos_token_id=tokenizer.eos_token_id)
|
52 |
tokenizer.decode(txt[0], skip_special_tokens=True)
|
53 |
```
|
54 |
+
## Limitations:
|
55 |
+
Gradient-Accumulation led to divergence after a couple of steps. Therefore we reduced the blocksize to 1024 and used two RTX 3090 to get a BS of 4. Probably too small to generalize well.
|
56 |
## Training:
|
57 |
Training was based on Llama-X with the adaptions of WizardLMs training script and additional adjustments to QLoRa tune. MPT-Code from <a href="https://huggingface.co/SebastianBodza/mpt-30B-qlora-multi_GPU">SebastianBodza/mpt-30B-qlora-multi_GPU</a>
|
58 |
|