metadata

library_name: transformers
license: apache-2.0
datasets:
  - abideen/Cosmopedia-100k-pretrain
language:
  - en
base_model:
  - meta-llama/Llama-3.1-8B-Instruct

🚀 BitNet-Llama3 (from 8B to 2B) Transformation & Training

This project transforms a Llama3 model from 8B parameters to a BitNet architecture with 2B parameters, applying BitLinear layers. Additionally, the model is trained with a predefined dataset and uploaded to Hugging Face for future use.

Model Description

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

Developed by: [email protected]
Funded by [optional]: ITCL
Shared by [optional]: [More Information Needed]
Model type: LLama3 8B Tramsformed to Bitnet
Language(s) (NLP): Bitnet
License: [More Information Needed]
Finetuned from model [optional]: [More Information Needed]

Model Sources [optional]

Repository: ejbejaranos/Bitnet-Llama3-from8BM-now2B

📄 Description

This repository includes scripts to:

🎯 Transform a Llama3 model to a BitNet architecture.
💻 Train the model using Hugging Face and Weights & Biases.
🚀 Upload the transformed and trained model to Hugging Face for inference and future use.

⚙️ Requirements

Python 3.8+
Pytorch 1.10+
Transformers 4.0+
Hugging Face Hub API
Weights & Biases

🧰 Installation

Make sure you have all required dependencies installed:

pip install torch transformers datasets wandb huggingface_hub

💥 How to Use

Using the trained model for inference

from transformers import AutoModelForCausalLM, AutoTokenizer
from utils.bitnet_transformation import replace_linears_in_hf

# Load the BitNet model
model = "ejbejaranos/Bitnet-Llama3-from8BM-now2B"
model = AutoModelForCausalLM.from_pretrained(
    model,
    use_auth_token="YOUR_HF_TOKEN"
)

# Replace BitNet layers for inference
replace_linears_in_hf(model)
tokenizer = AutoTokenizer.from_pretrained("ejbejaranos/Bitnet-Llama3-from8BM-now2B")

# Set up for inference
model.to(device="cuda:0")
prompt = "What is Machine Learning?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
generate_ids = model.generate(inputs.input_ids, max_length=50)
output = tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]

print(output)

🧑‍🔬 Metrics

During training, the following metrics will be logged to Weights & Biases:

final_loss: 1.4.
final_perplexity: 4.2.

🎯 Future Goals

Implement additional quantization layers for inference.
Test the model on different datasets and contexts.

📢 Contact

If you have questions, suggestions, or improvements, feel free to open an Issue or contact us through Hugging Face.

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: [More Information Needed]
Hours used: [More Information Needed]
Cloud Provider: [More Information Needed]
Compute Region: [More Information Needed]
Carbon Emitted: [More Information Needed]

💡 Acknowledgments

Thanks to Hugging Face and Weights & Biases for providing support and tools.