Gemma2-9B-Swahili-IT

Gemma2-9B-Swahili-IT is a state-of-the-art open variant of Google's Gemma2-9B-IT model, fine-tuned for natural Swahili language understanding and generation. This model demonstrates strong performance across various language tasks while maintaining efficient resource usage.

Model Details

Developer: Alfaxad Eyembe
Base Model: google/gemma-2-9b-it
Model Type: Decoder-only transformer
Language(s): Swahili
License: Apache 2.0
Finetuning Approach: Low-Rank Adaptation (LoRA)

Training Data

The model was fine-tuned on a comprehensive dataset containing:

67,017 instruction-response pairs
16,273,709 total tokens
Average 242.83 tokens per example
High-quality, naturally-written Swahili content

Performance

Massive Multitask Language Understanding (MMLU) - Swahili

Base Model: 45.61% accuracy
Fine-tuned Model: 52.63% accuracy
Improvement: +7.02%

Sentiment Analysis - Swahili

Base Model: 84.85% accuracy
Fine-tuned Model: 86.00% accuracy
Improvement: +1.15%
Perfect response validity (100%)

Intended Use

This model is designed for:

Natural Swahili text generation
Question answering
Sentiment analysis
Creative writing
General instruction following in Swahili

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("alfaxadeyembe/gemma2-9b-swahili-it")
model = AutoModelForCausalLM.from_pretrained(
    "alfaxadeyembe/gemma2-9b-swahili-it",
    device_map="auto",
    torch_dtype=torch.bfloat16
)

# Always set to eval mode for inference
model.eval()

# Example usage
prompt = "Eleza dhana ya uchumi wa kidijitali na umuhimu wake katika ulimwengu wa leo."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=500,
        do_sample=True,
        temperature=0.7,
        top_p=0.95
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Training Details

Fine-tuning Method: LoRA
Training Steps: 400
Batch Size: 2
Gradient Accumulation Steps: 32
Learning Rate: 2e-4
Training Time: ~12 hours on A100 GPU

Key Features

Strong performance on structured tasks
Natural Swahili language generation
Balanced technical and conversational capabilities
Efficient parameter updates through LoRA
Improved response coherence and completion

Citation

@misc{gemma2-9b-swahili-it,
  author = {Alfaxad Eyembe},
  title = {Gemma2-9B-Swahili-IT: SWahili Variation For Gemma2-9B-IT},
  year = {2025},
  publisher = {Hugging Face},
  journal = {Hugging Face Model Hub},
}

Contact

For questions or feedback, please reach out through:

HuggingFace: @alfaxadeyembe
Twitter: @alfxad

Alfaxad
/

gemma2-9b-swahili-it