ayoubkirouane/Mamba-Chat-2.8B

what is Mamba ?

The Mamba model emerges as a groundbreaking player in the Natural Language Processing (NLP) landscape, set to redefine sequence modeling through its superior efficiency and performance.

Designed as a state-space model (SSM), Mamba efficiently handles intricate, information-dense sequences such as text, audio, and genomic data, distinguishing itself from widely used Transformer models by relying on "selective state spaces" for information processing.

Noteworthy features include linear-time scaling, outperforming similar-sized Transformers in tasks like language modeling and machine translation, and boasting a fivefold increase in inference speed.

Mamba's versatility extends beyond NLP, showcasing top-tier performance in diverse fields like audio analysis and genomics.

The model's potential is promising, offering faster and more potent language models for tasks like text summarization, question answering, and dialogue systems. However, it is important to note that Mamba is still in the early stages of development, requiring further experimentation to solidify its long-term performance and reliability. Integration into existing NLP tools may necessitate adaptations and adjustments. Despite these considerations, the Mamba model stands as a catalyst for NLP advancements, holding the promise of revolutionizing the field with its efficiency, speed, and performance that surpass current state-of-the-art models.

Mamba-Chat :

https://github.com/havenhq/mamba-chat

Introducing Mamba-Chat 🐍, the pioneering chat language model that diverges from the transformer architecture by adopting a state-space model framework.

Built upon the research by Albert Gu and Tri Dao, specifically their work titled "Mamba: Linear-Time Sequence Modeling with Selective State Spaces" (paper), this model leverages their implementation.

Mamba-Chat, based on Mamba-2.8B, underwent fine-tuning on 16,000 samples from the HuggingFaceH4/ultrachat_200k dataset.

Usage :

!pip install causal-conv1d==1.0.0
!pip install mamba-ssm==1.0.1
!export LC_ALL="en_US.UTF-8"
!export LD_LIBRARY_PATH="/usr/lib64-nvidia"
!export LIBRARY_PATH="/usr/local/cuda/lib64/stubs"
!ldconfig /usr/lib64-nvidia

import torch
from transformers import AutoTokenizer
from mamba_ssm.models.mixer_seq_simple import MambaLMHeadModel

device = "cuda"
tokenizer = AutoTokenizer.from_pretrained("havenhq/mamba-chat")
tokenizer.eos_token = "<|endoftext|>"
tokenizer.pad_token = tokenizer.eos_token
tokenizer.chat_template = AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-beta").chat_template

model = MambaLMHeadModel.from_pretrained("ayoubkirouane/Mamba-Chat-2.8B", device="cuda", dtype=torch.float16)
messages = []
user_message = """
Write your message here ..
"""

messages.append(dict(role="user",content=user_message))
input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to("cuda")
out = model.generate(input_ids=input_ids, max_length=2000, temperature=0.9, top_p=0.7, eos_token_id=tokenizer.eos_token_id)
decoded = tokenizer.batch_decode(out)
messages.append(dict(role="assistant",content=decoded[0].split("<|assistant|>\n")[-1]))
print("Model:", decoded[0].split("<|assistant|>\n")[-1])