This model still spits gibberish and not good enough. Still gonna add more to this model for a while and see if its improving.

Whisper Tiny Fine-Tuned on Kalaallisut (Greenlandic) 🌍

This is a fine-tuned version of the Whisper Tiny model by OpenAI, adapted to the Kalaallisut (Greenlandic) language. The model has been trained and optimized to handle transcriptions specifically for this language, which is historically underrepresented in speech recognition models.

📚 Training Process

This model was carefully trained on a dataset of Kalaallisut audio files paired with transcriptions. Special care was taken to avoid overfitting, which occurred in earlier versions of this fine-tuning process. After reworking the training approach, including tweaking hyperparameters and employing early stopping to monitor model performance, the final Word Error Rate (WER) was reduced significantly to:

1.81%

⚙️ Features and Improvements

Reduced Overfitting: This version addresses overfitting by employing early stopping with fine-tuned patience and threshold settings to halt training when improvements stalled, ensuring the model generalized better to unseen data.
Kalaallisut Language Support: Whisper's multi-lingual capabilities are fine-tuned specifically for the unique phonetics and structure of Kalaallisut.
Optimized for Whisper Tiny: Even though this model is based on the smallest variant of Whisper (Tiny), it still achieves strong performance in transcription tasks for Kalaallisut.

📊 Performance Metrics

Word Error Rate (WER): 1.81%
Train Loss: 0.77 after 50 epochs

Usually trigged by Early Stopping Criteria incoded to the code.

How to Use

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch

# Load the processor and model
processor = WhisperProcessor.from_pretrained("VoiceLessQ/whisper-tiny-kalaallisut")
model = WhisperForConditionalGeneration.from_pretrained("VoiceLessQ/whisper-tiny-kalaallisut")

# Load audio (example usage)
audio_file = "path_to_audio_file.wav"
input_features = processor(audio_file, return_tensors="pt").input_features

# Generate transcription
with torch.no_grad():
    generated_ids = model.generate(input_features)
transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)
print(transcription)