CAMeL-Lab
/

arat5-coda-did

Text2Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

balhafni commited on Jul 6

Commit

83267f3

•

1 Parent(s): 3c25f31

Update README.md

Files changed (1) hide show

README.md +56 -0

README.md CHANGED Viewed

@@ -21,7 +21,63 @@ The model is intended to be used with the dialect identification system that is
 ## How to use
 ## Citation
 ```bibtex

 ## How to use
+```python
+from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
+from camel_tools.dialectid import DIDModel6
+import torch
+DID = DIDModel6.pretrained()
+DA_PHRASE_MAP = {'BEI': 'في بيروت منقول',
+                 'CAI': 'في القاهرة بنقول',
+                 'DOH': 'في الدوحة نقول',
+                 'RAB': 'في الرباط كنقولو',
+                 'TUN': 'في تونس نقولو'}
+def predict_dialect(sent):
+    """Predicts the dialect of a sentence using the
+       CAMeL Tools MADAR 6 DID model"""
+    predictions = DID.predict([sent])
+    if predictions[0].top != "MSA":
+        scores = predictions[0].scores
+        highest = sorted(
+            scores.items(), key=lambda x: x[1], reverse=True)[0]
+        name = highest[0]
+        score = highest[1]
+    else:
+        scores = predictions[0].scores
+        second_highest = sorted(
+            scores.items(), key=lambda x: x[1], reverse=True)[1]
+        name = second_highest[0]
+        score = second_highest[1]
+    return name, score
+tokenizer = AutoTokenizer.from_pretrained('CAMeL-Lab/arat5-coda-did')
+model = AutoModelForSeq2SeqLM.from_pretrained('CAMeL-Lab/arat5-coda-did')
+text = 'اتنين هامبورجر و اتنين قهوة، لو سمحت. عايزهم تيك اواي.'
+pred_dialect, _ = predict_dialect(text)
+text = DA_PHRASE_MAP[pred_dialect] + ' ' + text
+inputs = tokenizer(text, return_tensors='pt')
+gen_kwargs = {'num_beams': 5, 'max_length': 200,
+              'num_return_sequences': 1,
+              'no_repeat_ngram_size': 0, 'early_stopping': False
+              }
+codafied_text = model.generate(**inputs, **gen_kwargs)
+codafied_text = tokenizer.batch_decode(codafied_text,
+                                       skip_special_tokens=True,
+                                       clean_up_tokenization_spaces=False)[0]
+print(codafied_text)
+"اثنين هامبورجر واثنين قهوة، لو سمحت. عايزهم تيك اوي."
+```
 ## Citation
 ```bibtex