agaliano commited on
Commit
f73ce2b
1 Parent(s): e54dff5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +39 -1
README.md CHANGED
@@ -2,6 +2,26 @@
2
  license: apache-2.0
3
  ---
4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  ## Usage
6
 
7
  ```python
@@ -18,4 +38,22 @@ inputs = tokenizer(sentence, return_tensors="pt")
18
  translated_tokens = model.generate(**inputs, forced_bos_token_id=tokenizer.lang_code_to_id["arg_Latn"])
19
 
20
  print(tokenizer.batch_decode(translated_tokens, skip_special_tokens=True))
21
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: apache-2.0
3
  ---
4
 
5
+ ## Overview
6
+ This model was presented at the [WMT24 Shared Task on Translation into Low-Resource Languages of Spain](https://www2.statmt.org/wmt24/romance-task.html)
7
+ as a submission by the [Transducens](https://transducens.dlsi.ua.es/) team from the [Universitat d'Alacant](https://www.ua.es/). It is a many-to-many model
8
+ capable of translating between several languages of the Iberian Peninsula.
9
+
10
+ **The model is based on [NLLB-1.3B](https://huggingface.co/facebook/nllb-200-1.3B), fine-tuned for the following languages:**
11
+ + Spanish ↔ Asturian
12
+ + Spanish ↔ Aragonese
13
+ + Spanish ↔ Aranese
14
+ + Spanish ↔ Galician
15
+ + Spanish ↔ Catalan
16
+ + Spanish ↔ Valencian
17
+ + Catalan ↔ Aranese
18
+
19
+ **The new language tokens are:**
20
+ + Aragonese: arg_Latn
21
+ + Aranese: arn_Latn
22
+ + Valencian: val_Latn
23
+
24
+
25
  ## Usage
26
 
27
  ```python
 
38
  translated_tokens = model.generate(**inputs, forced_bos_token_id=tokenizer.lang_code_to_id["arg_Latn"])
39
 
40
  print(tokenizer.batch_decode(translated_tokens, skip_special_tokens=True))
41
+ ```
42
+
43
+ ## Citation
44
+ If you use this model, please cite it as follows:
45
+ ```
46
+ @inproceedings{wmt2024-galiano-jimenez,
47
+ title = "Universitat d'{A}lacant's Submission to the {WMT} 2024 {S}hared {T}ask on {T}ranslating into {L}ow-{R}esource {L}anguages of {S}pain",
48
+ author = "Galiano-Jim{\'e}nez, Aar{\'o}n and S{\'a}nchez-Cartagena, V{\'i}ctor M and P{\'e}rez-Ortiz, Juan Antonio and S{\'a}nchez-Mart{\'i}nez, Felipe",
49
+ editor = "Koehn, Philipp and Haddow, Barry and Kocmi, Tom and Monz, Christof",
50
+ booktitle = "Proceedings of the Ninth Conference on Machine Translation",
51
+ month = nov,
52
+ year = "2024",
53
+ address = "Miami",
54
+ publisher = "Association for Computational Linguistics",
55
+ }
56
+ ```
57
+
58
+ ## Acknowledgements
59
+ This model has been produced as part of the research project [Lightweight neural translation technologies for low-resource languages (LiLowLa)](https://transducens.dlsi.ua.es/lilowla/) (PID2021-127999NB-I00) funded by the Spanish Ministry of Science and Innovation (MCIN), the Spanish Research Agency (AEI/10.13039/501100011033) and the European Regional Development Fund A way to make Europe.