Terjman-Nano (77M params)

Our model is built upon the powerful Transformer architecture, leveraging state-of-the-art natural language processing techniques. It is a fine-tuned version of Helsinki-NLP/opus-mt-en-ar on a the darija_english dataset enhanced with curated corpora ensuring high-quality and accurate translations.

It achieves the following results on the evaluation set:

Loss: 3.2038
Bleu: 10.6239
Gen Len: 35.2727

Try it out on our dedicated Terjman-Nano Space 🤗

Usage

Using our model for translation is simple and straightforward. You can integrate it into your projects or workflows via the Hugging Face Transformers library. Here's a basic example of how to use the model in Python:

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("atlasia/Terjman-Nano")
model = AutoModelForSeq2SeqLM.from_pretrained("atlasia/Terjman-Nano")

# Define your Moroccan Darija Arabizi text
input_text = "Your english text goes here."

# Tokenize the input text
input_tokens = tokenizer(input_text, return_tensors="pt", padding=True, truncation=True)

# Perform translation
output_tokens = model.generate(**input_tokens)

# Decode the output tokens
output_text = tokenizer.decode(output_tokens[0], skip_special_tokens=True)

print("Translation:", output_text)

Example

Let's see an example of transliterating Moroccan Darija Arabizi to Arabic:

Input: "Hi my friend, can you tell me a joke in moroccan darija? I'd be happy to hear that from you!"

Output: "مرحبا يا صديقي، يمكن تقال لي نكتة فالداريا المغاربية؟ أنا سَأكُونُ سعيد بسمْاع هادشي منك!"

Limiations

This version has some limitations mainly due to the Tokenizer. We're currently collecting more data with the aim of continous improvements.

Feedback

We're continuously striving to improve our model's performance and usability and we will be improving it incrementaly. If you have any feedback, suggestions, or encounter any issues, please don't hesitate to reach out to us.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 3e-05
train_batch_size: 64
eval_batch_size: 64
seed: 42
gradient_accumulation_steps: 4
total_train_batch_size: 256
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.03
num_epochs: 40

Training results

Training Loss	Epoch	Step	Validation Loss	Bleu	Gen Len
No log	0.9982	140	4.8431	6.4393	31.6253
No log	1.9964	280	3.9077	7.7671	36.1047
No log	2.9947	420	3.6453	8.5008	35.303
4.7676	4.0	561	3.5034	9.293	34.416
4.7676	4.9982	701	3.4161	9.3322	34.5702
4.7676	5.9964	841	3.3582	9.6792	34.438
4.7676	6.9947	981	3.3182	9.8804	35.27
3.7555	8.0	1122	3.2904	10.0802	34.7576
3.7555	8.9982	1262	3.2684	10.2161	34.1873
3.7555	9.9964	1402	3.2534	10.0777	34.6612
3.6059	10.9947	1542	3.2420	10.637	34.6281
3.6059	12.0	1683	3.2325	10.6797	35.1185
3.6059	12.9982	1823	3.2267	10.5413	34.8898
3.6059	13.9964	1963	3.2210	10.6098	35.0
3.5561	14.9947	2103	3.2169	10.4863	34.8567
3.5561	16.0	2244	3.2141	10.6152	34.7328
3.5561	16.9982	2384	3.2119	10.6701	34.8815
3.5363	17.9964	2524	3.2100	10.5632	34.7576
3.5363	18.9947	2664	3.2089	10.5707	34.8623
3.5363	20.0	2805	3.2077	10.6275	34.8678
3.5363	20.9982	2945	3.2066	10.6857	35.0413
3.5299	21.9964	3085	3.2062	10.8112	35.3251
3.5299	22.9947	3225	3.2056	10.6908	34.0413
3.5299	24.0	3366	3.2051	10.5719	35.4298
3.5241	24.9982	3506	3.2046	10.5667	34.9036
3.5241	25.9964	3646	3.2042	10.9389	35.3361
3.5241	26.9947	3786	3.2043	10.5972	34.9532
3.5241	28.0	3927	3.2043	10.6626	35.3113
3.5247	28.9982	4067	3.2042	10.5286	35.0689
3.5247	29.9964	4207	3.2038	10.6298	34.4959
3.5247	30.9947	4347	3.2039	10.5897	34.9449
3.5247	32.0	4488	3.2037	10.7971	35.4711
3.5208	32.9982	4628	3.2039	10.6665	34.8402
3.5208	33.9964	4768	3.2039	10.5543	35.27
3.5208	34.9947	4908	3.2034	10.785	35.022
3.5159	36.0	5049	3.2037	10.6311	34.3388
3.5159	36.9982	5189	3.2037	10.4617	34.3085
3.5159	37.9964	5329	3.2037	10.7629	34.4518
3.5159	38.9947	5469	3.2036	10.6729	35.2066
3.524	39.9287	5600	3.2038	10.6239	35.2727

Framework versions

Transformers 4.40.2
Pytorch 2.2.1+cu121
Datasets 2.19.1
Tokenizers 0.19.1

atlasia
/

Terjman-Nano

Terjman-Nano (77M params)

Usage

Example

Limiations

Feedback

Training hyperparameters

Training results

Framework versions

Model tree for atlasia/Terjman-Nano

Dataset used to train atlasia/Terjman-Nano

Spaces using atlasia/Terjman-Nano 2

Collection including atlasia/Terjman-Nano

Models

Evaluation results