Feature Extraction
Arabic

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Moroccan Darija Embedding Models

This repository contains word embedding models trained for Moroccan Darija, a widely spoken Arabic dialect in Morocco. Currently, it includes FastText-based embeddings trained on the curated Al Atlas dataset composed of Moroccan Darija text.

Features

  • FastText embeddings: Pre-trained word vectors using FastText, which supports subword information and works well with dialectal and morphologically rich languages.
  • Efficient training pipeline: Code for training FastText embeddings on Moroccan Darija datasets.
  • Pre-trained models: Ready-to-use embeddings for downstream NLP tasks are available in the Hugging Face hub

Installation

Clone the Github repository and install the required dependencies:

git clone https://github.com/BounharAbdelaziz/Moroccan-Darija-Embedding.git
cd Moroccan-Darija-Embedding
pip install -r requirements.txt

Usage

Loading Pre-trained Embeddings

You can load the trained FastText model using gensim:

import fasttext

model = fasttext.load_model("fasttext_cbow_v0.bin") # download the models from the hub  https://huggingface.co./atlasia/Moroccan-Darija-Embedding
word_vector = model.get_word_vector("كلمة")

Roadmap

  • ✅ FastText embeddings
  • ⏳ Word2Vec and GloVe embeddings
  • ⏳ Transformer-based contextual embeddings (e.g., BERT, RoBERTa)
  • ⏳ Sentence embeddings: Continue training the MoRdern-Bert model.

Contributing

Contributions are welcome! Feel free to open issues or submit pull requests to improve the models and codebase.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and HF Inference API was unable to determine this model's library.