ejbejaranos
/

Llama3-8B-ITCL-Bitnet1.6B

+# Llama3-8B-ITCL-Bitnet1.6B 🚀
+## Description 📜
+**Llama3-8B-ITCL-Bitnet1.6B** is an experimental LLM model transformed from Llama3, optimized with bitlinear layers to enhance memory efficiency and inference speed. This model is designed for natural language processing tasks and is particularly useful in environments where resource-efficient performance is required. 🌟
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/6419c2f6b4adb0e101b17b6c/WOcl2k9xdLT5aVqh-aERz.png)
+## Features 🌈
+- **Model Size:** 8B parameters 🧠
+- **Architecture:** BitNet 🏗️
+- **Bitlinear Layers:** Reduces weights to values of 1, 0, and -1. ➖
+- **Optimized for:** Fast inference and memory efficiency ⚡
+## Requirements 📦
+Make sure you have the following libraries installed:
+```bash
+pip install transformers torch huggingface_hub wandb coloredlogs
+```
+You can install these dependencies using pip! 🎉
+## Usage 🔍
+### Loading the Model
+To load the model, you can simply run the following code:
+Para usar este modelo, puedes cargarlo desde Hugging Face con el siguiente código:
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from transformers.models.llama.modeling_llama import *
+import torch
+from torch import nn
+import torch.nn.functional as F
+import coloredlogs
+import logging
+coloredlogs.install(level='INFO', fmt='%(asctime)s - %(levelname)s - %(message)s', logger=logging.getLogger())
+logger = logging.getLogger(__name__)
+HF_TOKEN = "you_api_key_here"
+model = "ejbejaranos/Llama3-8B-ITCL-Bitnet1.6B"
+# Load a pretrained BitNet model
+tokenizer = AutoTokenizer.from_pretrained(model)
+model = AutoModelForCausalLM.from_pretrained(
+    model,
+    token=HF_TOKEN
+)
+# Establece el pad_token_id
+model.config.pad_token_id = tokenizer.eos_token_id
+def count_parameters(model):
+    # Calculate the number of parameters in billions
+    num_params = sum(p.numel() for p in model.parameters() if p.requires_grad) / 10**9
+    print(f"Model size: {num_params:.3f}B parameters")
+    return int(num_params)
+def activation_quant(x):
+    scale = 127.0 / x.abs().max(dim=-1, keepdim=True).values.clamp_(min=1e-5)
+    y = (x * scale).round().clamp_(-128, 127)
+    y = y / scale
+    return y
+def weight_quant(w):
+    scale = 1.0 / w.abs().mean().clamp_(min=1e-5)
+    u = (w * scale).round().clamp_(-1, 1)
+    u = u / scale
+    return u
+class BitLinear(nn.Linear):
+    def forward(self, x):
+        w = self.weight  # a weight tensor with shape [d, k]
+        x = x.to(w.device)
+        RMSNorm = LlamaRMSNorm(x.shape[-1]).to(w.device)
+        x_norm = RMSNorm(x)
+        x_quant = x_norm + (activation_quant(x_norm) - x_norm).detach()
+        w_quant = w + (weight_quant(w) - w).detach()
+        y = F.linear(x_quant, w_quant)
+        return y
+def convert_to_bitnet(model, copy_weights):
+    for name, module in model.named_modules():
+        if isinstance(module, LlamaSdpaAttention) or isinstance(module, LlamaMLP):
+            for child_name, child_module in module.named_children():
+                if isinstance(child_module, nn.Linear):
+                    bitlinear = BitLinear(child_module.in_features, child_module.out_features, child_module.bias is not None).to(device="cuda:0")
+                    if copy_weights:
+                        bitlinear.weight = child_module.weight
+                        if child_module.bias is not None:
+                            bitlinear.bias = child_module.bias
+                    setattr(module, child_name, bitlinear)
+        elif isinstance(module, LlamaDecoderLayer):
+            for child_name, child_module in module.named_children():
+                if isinstance(child_module, LlamaRMSNorm) and child_name == "input_layernorm":
+                    setattr(module, child_name, nn.Identity().to(device="cuda:0"))
+convert_to_bitnet(model, copy_weights=True)
+model.to(device="cuda:0")
+logger.info(f"🔢 Number of parameters in the model after extracting weights: {count_parameters(model)}")
+logger.info(f"📏 Reduced model structure:\n{model}")
+prompt = "What is the color of sky?"
+inputs = tokenizer(prompt, return_tensors="pt", padding=True, truncation=True).to(model.device)
+inputs['attention_mask'] = inputs['input_ids'] != model.config.pad_token_id
+generate_ids = model.generate(inputs.input_ids, attention_mask=inputs['attention_mask'], max_length=250)
+decoded_output = tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)
+print(decoded_output[0])  # Print the generated response
+```
+### Performing Inference
+Generate text using the model to unleash its power! 💬✨
+## Training 🏋️
+To train the model, configure your settings and implement your training logic. 🛠️
+## Contributions 🤝
+If you would like to contribute to this project, please follow these steps:
+1. Fork the repository. 🍴
+2. Create your branch (`git checkout -b feature-new-feature`). 🌿
+3. Make your changes and commit. 📅
+4. Push to the branch. 📤
+5. Open a Pull Request. 📬
+## License 📄
+This project is licensed under the MIT License. See the `LICENSE` file for details.
+## Contact 📫
+For questions or suggestions, feel free to reach out to me:
+- **Email:** [email protected]
+- **GitHub:** [ejbejaranos](https://github.com/ejbejaranos) 🌐