BounharAbdelaziz commited on
Commit
51af4a0
1 Parent(s): 1d48acd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +60 -15
README.md CHANGED
@@ -4,6 +4,10 @@ tags:
4
  model-index:
5
  - name: Transliteration-Moroccan-Darija
6
  results: []
 
 
 
 
7
  ---
8
 
9
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -11,23 +15,16 @@ should probably proofread and complete it, then remove this comment. -->
11
 
12
  # Transliteration-Moroccan-Darija
13
 
14
- This model was trained from scratch on an unknown dataset.
 
15
 
16
- ## Model description
17
 
18
- More information needed
 
 
19
 
20
- ## Intended uses & limitations
21
-
22
- More information needed
23
-
24
- ## Training and evaluation data
25
-
26
- More information needed
27
-
28
- ## Training procedure
29
-
30
- ### Training hyperparameters
31
 
32
  The following hyperparameters were used during training:
33
  - learning_rate: 3e-05
@@ -41,9 +38,57 @@ The following hyperparameters were used during training:
41
  - lr_scheduler_warmup_ratio: 0.02
42
  - num_epochs: 120
43
 
44
- ### Framework versions
45
 
46
  - Transformers 4.39.2
47
  - Pytorch 2.2.2+cpu
48
  - Datasets 2.18.0
49
  - Tokenizers 0.15.2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  model-index:
5
  - name: Transliteration-Moroccan-Darija
6
  results: []
7
+ datasets:
8
+ - atlasia/ATAM
9
+ language:
10
+ - ar
11
  ---
12
 
13
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
15
 
16
  # Transliteration-Moroccan-Darija
17
 
18
+ This model is trained to convert Moroccan Darija text written in Arabizi (Latin script) to Arabic letters.
19
+ Whether you're dealing with informal texts, social media posts, or any other content in Moroccan Arabizi, the model is here to help you accurately transliterate it into Arabic script.
20
 
21
+ ## Model Overview
22
 
23
+ Our model is built upon the powerful Transformer architecture, leveraging state-of-the-art natural language processing techniques.
24
+ It has been trained from scratch on the "atlasia/ATAM" dataset, specifically for the task of transliterating Moroccan Darija Arabizi into Arabic letters, ensuring high-quality and accurate transliterations.
25
+ Furthermore, we trained a BPE Tokenizer specifically for this task.
26
 
27
+ ## Training hyperparameters
 
 
 
 
 
 
 
 
 
 
28
 
29
  The following hyperparameters were used during training:
30
  - learning_rate: 3e-05
 
38
  - lr_scheduler_warmup_ratio: 0.02
39
  - num_epochs: 120
40
 
41
+ ## Framework versions
42
 
43
  - Transformers 4.39.2
44
  - Pytorch 2.2.2+cpu
45
  - Datasets 2.18.0
46
  - Tokenizers 0.15.2
47
+
48
+ ## Usage
49
+
50
+ Using our model for transliteration is simple and straightforward.
51
+ You can integrate it into your projects or workflows via the Hugging Face Transformers library.
52
+ Here's a basic example of how to use the model in Python:
53
+
54
+ ```python
55
+ from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
56
+
57
+ # Load the tokenizer and model
58
+ tokenizer = AutoTokenizer.from_pretrained("BounharAbdelaziz/Transliteration-Moroccan-Darija")
59
+ model = AutoModelForSeq2SeqLM.from_pretrained("BounharAbdelaziz/Transliteration-Moroccan-Darija")
60
+
61
+ # Define your Moroccan Darija Arabizi text
62
+ input_text = "Your Moroccan Darija Arabizi text goes here."
63
+
64
+ # Tokenize the input text
65
+ input_tokens = tokenizer(input_text, return_tensors="pt", padding=True, truncation=True)
66
+
67
+ # Perform transliteration
68
+ output_tokens = model.generate(**input_tokens)
69
+
70
+ # Decode the output tokens
71
+ output_text = tokenizer.decode(output_tokens[0], skip_special_tokens=True)
72
+
73
+ print("Transliteration:", output_text)
74
+ ```
75
+
76
+ ## Example
77
+
78
+ Let's see an example of transliterating Moroccan Darija Arabizi to Arabic:
79
+
80
+ **Input**: "kayn chi"
81
+
82
+ **Output**: "كاين شي"
83
+
84
+
85
+ ## Limiations
86
+
87
+ This version has some limitations mainly due to the Tokenizer.
88
+ We're currently collecting more data with the aim of continous improvements
89
+
90
+ ## Feedback
91
+
92
+ We're continuously striving to improve our model's performance and usability and we will be improving it incrementaly.
93
+ If you have any feedback, suggestions, or encounter any issues, please don't hesitate to reach out to us.
94
+