ricardo-larosa
commited on
Commit
•
7a4ba7b
1
Parent(s):
896b902
Update README.md
Browse files
README.md
CHANGED
@@ -22,8 +22,8 @@ This mistral model was trained 2x faster with [Unsloth](https://github.com/unslo
|
|
22 |
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
|
23 |
# Techniques used
|
24 |
1. Quantization: They provide 4-bit quantized models which are 4x faster to download and use 4x less memory (I observed that the reduction of precision did not affect too much the performance of the model).
|
25 |
-
2. Lower Ranking Adaptation: They provide LoRA adapters which allow to only update 1 to 10% of all parameters
|
26 |
-
3. Rotary Positional Embedding Scaling: They support RoPE Scaling internally instead of traditional positional embeddings.
|
27 |
|
28 |
# Performance
|
29 |
I did not see any OOMs and the memory usage was steady at 10GB on a A100 GPU (I could've easily used a V100).
|
|
|
22 |
[<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
|
23 |
# Techniques used
|
24 |
1. Quantization: They provide 4-bit quantized models which are 4x faster to download and use 4x less memory (I observed that the reduction of precision did not affect too much the performance of the model).
|
25 |
+
2. Lower Ranking Adaptation: They provide LoRA adapters which allow to only update 1 to 10% of all parameters.
|
26 |
+
3. Rotary Positional Embedding Scaling: They support RoPE Scaling internally instead of traditional positional embeddings.
|
27 |
|
28 |
# Performance
|
29 |
I did not see any OOMs and the memory usage was steady at 10GB on a A100 GPU (I could've easily used a V100).
|