abideen
/

Bitnet-Llama-70M

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

abideen commited on Apr 4

Commit

1759db7

•

1 Parent(s): d87f477

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -17,9 +17,9 @@ tags:
 Bitnet-LLama-70M is a 70M parameter model trained using the method described in [The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits](https://arxiv.org/abs/2402.17764).
-It was trained on the subset of the [HuggingFaceTB/cosmopedia](https://huggingface.co/datasets/HuggingFaceTB/cosmopedia) dataset. This is just a small experiment to try out BitNet.
-Bitnet-LLama-70M was trained
 Wandb training report is as follows:

 Bitnet-LLama-70M is a 70M parameter model trained using the method described in [The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits](https://arxiv.org/abs/2402.17764).
+It was trained on the subset of the [HuggingFaceTB/cosmopedia](https://huggingface.co/datasets/HuggingFaceTB/cosmopedia) dataset. This is just a small experiment to try out BitNet. Bitnet-LLama-70M was trained for 2 epochs on 1xA100.
+This model is just an experiment and you might not get good results while chatting with it due to smaller model size and less training.
 Wandb training report is as follows: