---
language:
  - kal  # Kalaallisut language (Greenlandic)
license: mit  # Model license
metrics:
  - wer  # Word Error Rate (WER) used as evaluation metric
model_name: Whisper Tiny Fine-tuned on Kalaallisut
tags:
  - whisper
  - automatic-speech-recognition
  - speech-to-text
  - kalaallisut
  - greenlandic
pipeline_tag: automatic-speech-recognition
widget:
  - src: https://huggingface.co./datasets/your-dataset/sample_audio.mp3  # Replace with actual path to audio file
---

# Whisper Tiny Fine-tuned on Kalaallisut

There is a chance that i redo and start from beginning, depends on how it goes. Please dont depend on the transcription from it.....

This model is a fine-tuned version of the `openai/whisper-tiny` model on a **small dataset** of the Kalaallisut (Greenlandic) language. Whisper is a general-purpose speech recognition model trained on a large-scale dataset. However, this fine-tuned version on Kalaallisut **may still produce unreliable transcriptions** due to the small amount of available training data.

## Model Details

- **Model Name**: Whisper Tiny Fine-tuned on Kalaallisut
- **Base Model**: [openai/whisper-tiny](https://huggingface.co./openai/whisper-tiny)
- **Fine-tuned on**: Kalaallisut language dataset
- **Dataset**: Audio-transcription pairs (limited in size and variety)
- **Purpose**: Speech-to-text for the Kalaallisut language
- **License**: MIT License

## Fine-Tuning Process

The model has been fine-tuned **incrementally** with newly added data in Kalaallisut. Each fine-tuning session adds more data, which helps the model improve its understanding of the language. However, due to the small dataset, it is still prone to **overfitting** and may produce **inaccurate or gibberish transcriptions** in some cases.

- **Fine-tuning strategy**: New data was added, and the model was fine-tuned using a low learning rate to avoid overwriting previously learned weights.
- **Learning Rate**: 1e-5 (reduced in later fine-tuning stages to 5e-6).
- **Batch Size**: 16 for training, 8 for evaluation
- **Evaluation Metric**: Word Error Rate (WER) used to evaluate the model’s performance.
- **Checkpoints**: Frequent checkpointing and validation during training to prevent overfitting.

### Recent Fine-tuning Updates:

- **New data**: Additional Kalaallisut audio data was added, expanding the model’s vocabulary and helping it better understand different speech patterns.
- **Improved performance**: The model has shown some improvements, but it still struggles with more complex or lengthy speech inputs.
- **Overfitting reduction**: Adjustments such as lower learning rates and early stopping have been introduced to mitigate overfitting.

## Training Data

The training data consists of a small set of **audio-transcription pairs** in Kalaallisut. Due to the **limited size of the dataset**, the model’s performance is not fully reliable for general use and may produce **inaccurate or gibberish transcriptions** for complex or diverse audio inputs.

- **Hours of Audio**: The dataset is still limited to a few hours of speech data.
- **Dataset Type**: Spoken words and phrases in Kalaallisut, with some conversational audio added in later updates.
- **Limitations**: The model’s performance is limited by the small dataset, and it may not generalize well to more complex audio inputs, especially those containing less common phrases or dialects.

### Known Issues

- **Transcription Quality**: The model may produce **gibberish or incorrect transcriptions**, especially for longer or more complex audio inputs.
- **Small dataset limitations**: The model's vocabulary is limited, and it struggles with **less common words** or more phonetically complex speech.
- **Noisy or fast speech**: The model may produce unintelligible transcriptions for audio that is noisy or spoken too quickly.

### Limitations

- **Generalization issues**: Due to the small dataset, the model may not generalize well to new or more complex audio inputs.
- **Inaccurate transcriptions**: The model may produce **gibberish or incorrect transcriptions** in certain scenarios.
- **Unsuitable for production use**: The model is currently not suitable for large-scale production applications and is intended for experimentation, research, or small projects.

### Intended Use

This model can be used for:

- **Experimentation and research**: Useful for testing speech-to-text in the Kalaallisut language, but note that the output may not always be reliable.
- **Small projects**: It can be useful for transcription tasks in small-scale projects that require Kalaallisut language support.
- **Further fine-tuning**: The model is suitable for further fine-tuning with additional data to improve its performance.


### Usage Example

To use the model for transcription, you can load it with the Hugging Face `transformers` library:

```python
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch

# Load the model and processor
processor = WhisperProcessor.from_pretrained("VoiceLessQ/whisper-tiny-kalaallisut")
model = WhisperForConditionalGeneration.from_pretrained("VoiceLessQ/whisper-tiny-kalaallisut")

# Load and process an audio file (replace this with your audio loading method)
audio_array = ...  # Load your audio file as a numpy array or waveform

# Prepare the input features for the model
input_features = processor(audio_array, sampling_rate=16000, return_tensors="pt").input_features

# Generate transcription from the audio input
generated_ids = model.generate(input_features)
transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

print(f"Transcription: {transcription}")