whisper-tiny-kalaallisut / Whisper Fine-Tuning By loading Our Model\README.md
VoiceLessQ's picture
Upload Whisper Fine-Tuning By loading Our Model\README.md with huggingface_hub
2a10a00 verified

Whisper Tiny Fine-tuned on Kalaallisut

This model is a fine-tuned version of the openai/whisper-tiny on a very small dataset of the Kalaallisut (Greenlandic) language. Whisper is a general-purpose speech recognition model trained on a large-scale dataset. However, this fine-tuned version on Kalaallisut is not fully reliable due to the limited amount of training data.

Model Details

  • Model: Whisper Tiny Fine-tuned on Kalaallisut
  • Base Model: openai/whisper-tiny
  • Training Data: A very small amount of audio-transcription pairs in Kalaallisut
  • Purpose: Speech-to-text for the Kalaallisut language
  • License: MIT License

Training Data

This model was fine-tuned on a small dataset of the Kalaallisut language. The dataset was not comprehensive and does not cover all aspects of the language. As a result, the model is not reliable for general use cases and may produce incorrect transcriptions.

Limitations

  • The model has been trained on a very small dataset (only a few hours of audio).
  • The transcriptions may not be accurate, especially for more complex audio inputs.
  • This model is a proof of concept and not intended for production use.

Intended Use

The model can be used for:

  • Experimentation and testing with Kalaallisut language speech-to-text tasks.
  • It may be helpful for small projects or as a foundation for further fine-tuning with more data.

Usage Example

You can use the model for transcription with the following code:

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch

# Load the model and processor
processor = WhisperProcessor.from_pretrained("VoiceLessQ/whisper-tiny-kalaallisut")
model = WhisperForConditionalGeneration.from_pretrained("VoiceLessQ/whisper-tiny-kalaallisut")

# Load and process an audio file
audio_array = ...  # Load your audio file here
input_features = processor(audio_array, sampling_rate=16000, return_tensors="pt").input_features

# Generate transcription
generated_ids = model.generate(input_features)
transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]

print(f"Transcription: {transcription}")