Gemma2-9B-Swahili-IT

Gemma2-9B-Swahili-IT is a state-of-the-art open variant of Google's Gemma2-9B-IT model, fine-tuned for natural Swahili language understanding and generation. This model demonstrates strong performance across various language tasks while maintaining efficient resource usage.

Model Details

  • Developer: Alfaxad Eyembe
  • Base Model: google/gemma-2-9b-it
  • Model Type: Decoder-only transformer
  • Language(s): Swahili
  • License: Apache 2.0
  • Finetuning Approach: Low-Rank Adaptation (LoRA)

Training Data

The model was fine-tuned on a comprehensive dataset containing:

  • 67,017 instruction-response pairs
  • 16,273,709 total tokens
  • Average 242.83 tokens per example
  • High-quality, naturally-written Swahili content

Performance

Massive Multitask Language Understanding (MMLU) - Swahili

  • Base Model: 45.61% accuracy
  • Fine-tuned Model: 52.63% accuracy
  • Improvement: +7.02%

Sentiment Analysis - Swahili

  • Base Model: 84.85% accuracy
  • Fine-tuned Model: 86.00% accuracy
  • Improvement: +1.15%
  • Perfect response validity (100%)

Intended Use

This model is designed for:

  • Natural Swahili text generation
  • Question answering
  • Sentiment analysis
  • Creative writing
  • General instruction following in Swahili

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("alfaxadeyembe/gemma2-9b-swahili-it")
model = AutoModelForCausalLM.from_pretrained(
    "alfaxadeyembe/gemma2-9b-swahili-it",
    device_map="auto",
    torch_dtype=torch.bfloat16
)

# Always set to eval mode for inference
model.eval()

# Example usage
prompt = "Eleza dhana ya uchumi wa kidijitali na umuhimu wake katika ulimwengu wa leo."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=500,
        do_sample=True,
        temperature=0.7,
        top_p=0.95
    )

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Training Details

  • Fine-tuning Method: LoRA
  • Training Steps: 400
  • Batch Size: 2
  • Gradient Accumulation Steps: 32
  • Learning Rate: 2e-4
  • Training Time: ~12 hours on A100 GPU

Key Features

  • Strong performance on structured tasks
  • Natural Swahili language generation
  • Balanced technical and conversational capabilities
  • Efficient parameter updates through LoRA
  • Improved response coherence and completion

Citation

@misc{gemma2-9b-swahili-it,
  author = {Alfaxad Eyembe},
  title = {Gemma2-9B-Swahili-IT: SWahili Variation For Gemma2-9B-IT},
  year = {2025},
  publisher = {Hugging Face},
  journal = {Hugging Face Model Hub},
}

Contact

For questions or feedback, please reach out through:

Downloads last month
0
Safetensors
Model size
9.24B params
Tensor type
BF16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Alfaxad/gemma2-9b-swahili-it

Base model

google/gemma-2-9b
Finetuned
(113)
this model