File size: 5,911 Bytes
473ed92 c57c245 473ed92 0b16477 473ed92 0c92966 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 |
---
license: mit
datasets:
- oscar-corpus/OSCAR-2301
- allenai/nllb
- Helsinki-NLP/opus-100
language:
- en
- da
- nl
- de
- is
- 'no'
- sc
- af
- ca
- ro
- gl
- it
- pt
- es
- bg
- mk
- sr
- uk
- ru
- id
- ms
- th
- vi
- mg
- fr
- hu
- el
- cs
- pl
- lt
- lv
- ka
- zh
- ja
- ko
- fi
- et
- gu
- hi
- mr
- ne
- ur
- az
- kk
- ky
- tr
- uz
- ar
- he
- fa
base_model:
- haoranxu/ALMA-13B-Pretrain
---
[X-ALMA](https://arxiv.org/pdf/2410.03115) builds upon [ALMA-R](https://arxiv.org/pdf/2401.08417) by expanding support from 6 to 50 languages. It utilizes a plug-and-play architecture with language-specific modules, complemented by a carefully designed training recipe. This release includes the **X-ALMA pre-trained base model**.
```
@misc{xu2024xalmaplugplay,
title={X-ALMA: Plug & Play Modules and Adaptive Rejection for Quality Translation at Scale},
author={Haoran Xu and Kenton Murray and Philipp Koehn and Hieu Hoang and Akiko Eriguchi and Huda Khayrallah},
year={2024},
eprint={2410.03115},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2410.03115},
}
```
X-ALMA-13B-Pretrain is pre-trained on 50 languages: en,da,nl,de,is,no,sv,af,ca,ro,gl,it,pt,es,bg,mk,sr,uk,ru,id,ms,th,vi,mg,fr,hu,el,cs,pl,lt,lv,ka,zh,ja,ko,fi,et,gu,hi,mr,ne,ur,az,kk,ky,tr,uz,ar,he,fa.
All X-ALMA checkpoints are released at huggingface:
| Models | Model Link | Description |
|:-------------:|:---------------:|:---------------:|
| X-ALMA | [haoranxu/X-ALMA](https://huggingface.co./haoranxu/X-ALMA)) | X-ALMA model with all its modules |
| X-ALMA-13B-Pretrain | [haoranxu/X-ALMA-13B-Pretrain](https://huggingface.co./haoranxu/X-ALMA-13B-Pretrain) | X-ALMA 13B multilingual pre-trained base model |
| X-ALMA-Group1 | [haoranxu/X-ALMA-13B-Group1](https://huggingface.co./haoranxu/X-ALMA-13B-Group1) | X-ALMA group1 specific module and the merged model |
| X-ALMA-Group2 | [haoranxu/X-ALMA-13B-Group2](https://huggingface.co./haoranxu/X-ALMA-13B-Group2) | X-ALMA group2 specific module and the merged model |
| X-ALMA-Group3 | [haoranxu/X-ALMA-13B-Group3](https://huggingface.co./haoranxu/X-ALMA-13B-Group3) | X-ALMA group3 specific module and the merged model |
| X-ALMA-Group4 | [haoranxu/X-ALMA-13B-Group4](https://huggingface.co./haoranxu/X-ALMA-13B-Group4) | X-ALMA group4 specific module and the merged model |
| X-ALMA-Group5 | [haoranxu/X-ALMA-13B-Group5](https://huggingface.co./haoranxu/X-ALMA-13B-Group5) | X-ALMA group5 specific module and the merged model |
| X-ALMA-Group6 | [haoranxu/X-ALMA-13B-Group6](https://huggingface.co./haoranxu/X-ALMA-13B-Group6) | X-ALMA group6 specific module and the merged model |
| X-ALMA-Group7 | [haoranxu/X-ALMA-13B-Group7](https://huggingface.co./haoranxu/X-ALMA-13B-Group7) | X-ALMA group7 specific module and the merged model |
| X-ALMA-Group8 | [haoranxu/X-ALMA-13B-Group8](https://huggingface.co./haoranxu/X-ALMA-13B-Group8) | X-ALMA group8 specific module and the merged model |
## A quick start:
There are three ways to load X-ALMA for translation. An example of translating "我爱机器翻译。" into English (X-ALMA should also able to do multilingual open-ended QA).
**The first way**: loading the merged model where the language-specific module has been merged into the base model **(Recommended)**:
```
import torch
from transformers import AutoModelForCausalLM
from transformers import AutoTokenizer
from peft import PeftModel
GROUP2LANG = {
1: ["da", "nl", "de", "is", "no", "sv", "af"],
2: ["ca", "ro", "gl", "it", "pt", "es"],
3: ["bg", "mk", "sr", "uk", "ru"],
4: ["id", "ms", "th", "vi", "mg", "fr"],
5: ["hu", "el", "cs", "pl", "lt", "lv"],
6: ["ka", "zh", "ja", "ko", "fi", "et"],
7: ["gu", "hi", "mr", "ne", "ur"],
8: ["az", "kk", "ky", "tr", "uz", "ar", "he", "fa"],
}
LANG2GROUP = {lang: str(group) for group, langs in GROUP2LANG.items() for lang in langs}
group_id = LANG2GROUP["zh"]
model = AutoModelForCausalLM.from_pretrained(f"haoranxu/X-ALMA-13B-Group{group_id}", torch_dtype=torch.float16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(f"haoranxu/X-ALMA-13B-Group{group_id}", padding_side='left')
# Add the source sentence into the prompt template
prompt="Translate this from Chinese to English:\nChinese: 我爱机器翻译。\nEnglish:"
# X-ALMA needs chat template but ALMA and ALMA-R don't need it.
chat_style_prompt = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(chat_style_prompt, tokenize=False, add_generation_prompt=True)
input_ids = tokenizer(prompt, return_tensors="pt", padding=True, max_length=40, truncation=True).input_ids.cuda()
# Translation
with torch.no_grad():
generated_ids = model.generate(input_ids=input_ids, num_beams=5, max_new_tokens=20, do_sample=True, temperature=0.6, top_p=0.9)
outputs = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
print(outputs)
```
**The second way**: loading the base model and language-specific module **(Recommended)**:
```
model = AutoModelForCausalLM.from_pretrained("haoranxu/X-ALMA-13B-Pretrain", torch_dtype=torch.float16, device_map="auto")
model = PeftModel.from_pretrained(model, f"haoranxu/X-ALMA-13B-Group{group_id}")
tokenizer = AutoTokenizer.from_pretrained(f"haoranxu/X-ALMA-13B-Group{group_id}", padding_side='left')
```
**The third way**: loading the base model with all language-specific modules like MoE: (Require large GPU memory)
```
from modeling_xalma import XALMAForCausalLM
model = XALMAForCausalLM.from_pretrained("haoranxu/X-ALMA", torch_dtype=torch.float16, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("haoranxu/X-ALMA", padding_side='left')
# Add `lang="zh"`: specify the language to instruct the model on which group to use for the third loading method during generation.
generated_ids = model.generate(input_ids=input_ids, num_beams=5, max_new_tokens=20, do_sample=True, temperature=0.6, top_p=0.9, lang="zh")
``` |