Update README.md
Browse files
README.md
CHANGED
@@ -2,6 +2,26 @@
|
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
5 |
## Usage
|
6 |
|
7 |
```python
|
@@ -18,4 +38,22 @@ inputs = tokenizer(sentence, return_tensors="pt")
|
|
18 |
translated_tokens = model.generate(**inputs, forced_bos_token_id=tokenizer.lang_code_to_id["arg_Latn"])
|
19 |
|
20 |
print(tokenizer.batch_decode(translated_tokens, skip_special_tokens=True))
|
21 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
|
5 |
+
## Overview
|
6 |
+
This model was presented at the [WMT24 Shared Task on Translation into Low-Resource Languages of Spain](https://www2.statmt.org/wmt24/romance-task.html)
|
7 |
+
as a submission by the [Transducens](https://transducens.dlsi.ua.es/) team from the [Universitat d'Alacant](https://www.ua.es/). It is a many-to-many model
|
8 |
+
capable of translating between several languages of the Iberian Peninsula.
|
9 |
+
|
10 |
+
**The model is based on [NLLB-1.3B](https://huggingface.co/facebook/nllb-200-1.3B), fine-tuned for the following languages:**
|
11 |
+
+ Spanish ↔ Asturian
|
12 |
+
+ Spanish ↔ Aragonese
|
13 |
+
+ Spanish ↔ Aranese
|
14 |
+
+ Spanish ↔ Galician
|
15 |
+
+ Spanish ↔ Catalan
|
16 |
+
+ Spanish ↔ Valencian
|
17 |
+
+ Catalan ↔ Aranese
|
18 |
+
|
19 |
+
**The new language tokens are:**
|
20 |
+
+ Aragonese: arg_Latn
|
21 |
+
+ Aranese: arn_Latn
|
22 |
+
+ Valencian: val_Latn
|
23 |
+
|
24 |
+
|
25 |
## Usage
|
26 |
|
27 |
```python
|
|
|
38 |
translated_tokens = model.generate(**inputs, forced_bos_token_id=tokenizer.lang_code_to_id["arg_Latn"])
|
39 |
|
40 |
print(tokenizer.batch_decode(translated_tokens, skip_special_tokens=True))
|
41 |
+
```
|
42 |
+
|
43 |
+
## Citation
|
44 |
+
If you use this model, please cite it as follows:
|
45 |
+
```
|
46 |
+
@inproceedings{wmt2024-galiano-jimenez,
|
47 |
+
title = "Universitat d'{A}lacant's Submission to the {WMT} 2024 {S}hared {T}ask on {T}ranslating into {L}ow-{R}esource {L}anguages of {S}pain",
|
48 |
+
author = "Galiano-Jim{\'e}nez, Aar{\'o}n and S{\'a}nchez-Cartagena, V{\'i}ctor M and P{\'e}rez-Ortiz, Juan Antonio and S{\'a}nchez-Mart{\'i}nez, Felipe",
|
49 |
+
editor = "Koehn, Philipp and Haddow, Barry and Kocmi, Tom and Monz, Christof",
|
50 |
+
booktitle = "Proceedings of the Ninth Conference on Machine Translation",
|
51 |
+
month = nov,
|
52 |
+
year = "2024",
|
53 |
+
address = "Miami",
|
54 |
+
publisher = "Association for Computational Linguistics",
|
55 |
+
}
|
56 |
+
```
|
57 |
+
|
58 |
+
## Acknowledgements
|
59 |
+
This model has been produced as part of the research project [Lightweight neural translation technologies for low-resource languages (LiLowLa)](https://transducens.dlsi.ua.es/lilowla/) (PID2021-127999NB-I00) funded by the Spanish Ministry of Science and Innovation (MCIN), the Spanish Research Agency (AEI/10.13039/501100011033) and the European Regional Development Fund A way to make Europe.
|