Update README.md
Browse files
README.md
CHANGED
@@ -17,9 +17,9 @@ tags:
|
|
17 |
|
18 |
Bitnet-LLama-70M is a 70M parameter model trained using the method described in [The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits](https://arxiv.org/abs/2402.17764).
|
19 |
|
20 |
-
It was trained on the subset of the [HuggingFaceTB/cosmopedia](https://huggingface.co/datasets/HuggingFaceTB/cosmopedia) dataset. This is just a small experiment to try out BitNet.
|
21 |
|
22 |
-
|
23 |
|
24 |
Wandb training report is as follows:
|
25 |
|
|
|
17 |
|
18 |
Bitnet-LLama-70M is a 70M parameter model trained using the method described in [The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits](https://arxiv.org/abs/2402.17764).
|
19 |
|
20 |
+
It was trained on the subset of the [HuggingFaceTB/cosmopedia](https://huggingface.co/datasets/HuggingFaceTB/cosmopedia) dataset. This is just a small experiment to try out BitNet. Bitnet-LLama-70M was trained for 2 epochs on 1xA100.
|
21 |
|
22 |
+
This model is just an experiment and you might not get good results while chatting with it due to smaller model size and less training.
|
23 |
|
24 |
Wandb training report is as follows:
|
25 |
|