File size: 3,919 Bytes
bd08913 937882e bd08913 83380c0 bd08913 83380c0 bd08913 83380c0 bd08913 24bd5bb bd08913 24bd5bb bd08913 83380c0 bd08913 83380c0 bd08913 83380c0 bd08913 83380c0 bd08913 83380c0 bd08913 83380c0 bd08913 83380c0 bd08913 83380c0 bd08913 83380c0 bd08913 83380c0 bd08913 83380c0 bd08913 83380c0 bd08913 83380c0 bd08913 83380c0 bd08913 83380c0 bd08913 83380c0 bd08913 83380c0 bd08913 83380c0 bd08913 83380c0 bd08913 83380c0 bd08913 83380c0 bd08913 83380c0 bd08913 83380c0 bd08913 83380c0 bd08913 83380c0 bd08913 83380c0 bd08913 83380c0 bd08913 83380c0 bd08913 83380c0 bd08913 937882e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 |
---
library_name: transformers
license: apache-2.0
datasets:
- abideen/Cosmopedia-100k-pretrain
language:
- en
base_model:
- meta-llama/Llama-3.1-8B-Instruct
---
# π BitNet-Llama3 (from 8B to 2B) Transformation & Training
This project transforms a Llama3 model from 8B parameters to a BitNet architecture with 2B parameters, applying BitLinear layers. Additionally, the model is trained with a predefined dataset and uploaded to Hugging Face for future use.
---
### Model Description
<!-- Provide a longer summary of what this model is. -->
This is the model card of a π€ transformers model that has been pushed on the Hub. This model card has been automatically generated.
- **Developed by:** [email protected]
- **Funded by [optional]:** ITCL
- **Shared by [optional]:** [More Information Needed]
- **Model type:** LLama3 8B Tramsformed to Bitnet
- **Language(s) (NLP):** Bitnet
- **License:** [More Information Needed]
- **Finetuned from model [optional]:** [More Information Needed]
### Model Sources [optional]
<!-- Provide the basic links for the model. -->
- **Repository:** ejbejaranos/Bitnet-Llama3-from8BM-now2B
## π Description
This repository includes scripts to:
1. π― Transform a Llama3 model to a BitNet architecture.
2. π» Train the model using Hugging Face and Weights & Biases.
3. π Upload the transformed and trained model to Hugging Face for inference and future use.
---
## βοΈ Requirements
- Python 3.8+
- Pytorch 1.10+
- Transformers 4.0+
- Hugging Face Hub API
- Weights & Biases
---
## π§° Installation
Make sure you have all required dependencies installed:
```bash
pip install torch transformers datasets wandb huggingface_hub
```
## π₯ How to Use
1. Using the trained model for inference
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from utils.bitnet_transformation import replace_linears_in_hf
# Load the BitNet model
model = "ejbejaranos/Bitnet-Llama3-from8BM-now2B"
model = AutoModelForCausalLM.from_pretrained(
model,
use_auth_token="YOUR_HF_TOKEN"
)
# Replace BitNet layers for inference
replace_linears_in_hf(model)
tokenizer = AutoTokenizer.from_pretrained("ejbejaranos/Bitnet-Llama3-from8BM-now2B")
# Set up for inference
model.to(device="cuda:0")
prompt = "What is Machine Learning?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
generate_ids = model.generate(inputs.input_ids, max_length=50)
output = tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
print(output)
```
---
## π§βπ¬ Metrics
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6419c2f6b4adb0e101b17b6c/nCE1-KLDWDqSCmPtDMmWa.png)
During training, the following metrics will be logged to Weights & Biases:
- `final_loss`: 1.4.
- `final_perplexity`: 4.2.
---
## π― Future Goals
- Implement additional quantization layers for inference.
- Test the model on different datasets and contexts.
---
## π’ Contact
If you have questions, suggestions, or improvements, feel free to open an Issue or contact us through [Hugging Face](https://huggingface.co./ejbejaranos).
---
## Environmental Impact
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
- **Hardware Type:** [More Information Needed]
- **Hours used:** [More Information Needed]
- **Cloud Provider:** [More Information Needed]
- **Compute Region:** [More Information Needed]
- **Carbon Emitted:** [More Information Needed]
-
## π‘ Acknowledgments
Thanks to [Hugging Face](https://huggingface.co./) and [Weights & Biases](https://wandb.ai/) for providing support and tools. |