wmt22-cometkiwi-da / README.md
Ben Nguyen
Inference endpoint
7ffa8af
---
extra_gated_heading: Acknowledge license to accept the repository
extra_gated_button_content: Acknowledge license
pipeline_tag: translation
language:
- multilingual
- af
- am
- ar
- as
- az
- be
- bg
- bn
- br
- bs
- ca
- cs
- cy
- da
- de
- el
- en
- eo
- es
- et
- eu
- fa
- fi
- fr
- fy
- ga
- gd
- gl
- gu
- ha
- he
- hi
- hr
- hu
- hy
- id
- is
- it
- ja
- jv
- ka
- kk
- km
- kn
- ko
- ku
- ky
- la
- lo
- lt
- lv
- mg
- mk
- ml
- mn
- mr
- ms
- my
- ne
- nl
- no
- om
- or
- pa
- pl
- ps
- pt
- ro
- ru
- sa
- sd
- si
- sk
- sl
- so
- sq
- sr
- su
- sv
- sw
- ta
- te
- th
- tl
- tr
- ug
- uk
- ur
- uz
- vi
- xh
- yi
- zh
license: cc-by-nc-sa-4.0
---
This is a [COMET](https://github.com/Unbabel/COMET) quality estimation model by Unbabel: It receives a source sentence and the respective translation and returns a score that reflects the quality of the translation.
# Paper
[CometKiwi: IST-Unbabel 2022 Submission for the Quality Estimation Shared Task](https://aclanthology.org/2022.wmt-1.60) (Rei et al., WMT 2022)
# License:
cc-by-nc-sa-4.0
# Usage for Inference Endpoint
```python
import json
import requests
API_URL = ""
API_TOKEN="MY_API_KEY"
headers = {
"Authorization": f"Bearer {API_TOKEN}",
"Content-Type": "application/json",
}
def query(url, headers, payload):
data = json.dumps(payload)
response = requests.request("POST", url, headers=headers, data=data)
return json.loads(response.content.decode("utf-8"))
payload = {
"inputs": {
"batch_size": 8,
"workers": None,
"data": [
{
"src": "Youll be picking fruit and generally helping us do all the usual farm work",
"mt": "๋‹น์‹ ์€ ๊ณผ์ผ์„ ๋”ฐ๊ธฐ๋„ ํ•˜๊ณ  ๋Œ€์ฒด๋กœ ์šฐ๋ฆฌ๊ฐ€ ํ•˜๋Š” ์ผ์ƒ์ ์ธ ๋†์žฅ ์ผ์„ ๋•๊ฒŒ ๋  ๊ฒ๋‹ˆ๋‹ค",
},{
"src": "Youll be picking fruit and generally helping us do all the usual farm work",
"mt": "๋‹น์‹ ์€ ๊ณผ์ผ์„ ๋”ฐ๊ธฐ๋„ ํ•˜๊ณ  ๋Œ€์ฒด๋กœ ์šฐ๋ฆฌ๊ฐ€ ํ•˜๋Š” ์ผ์ƒ์ ์ธ ๋†์žฅ ์ผ์„ ๋•๊ฒŒ ๋  ๊ฒ๋‹ˆ๋‹ค",
},{
"src": "Youll be picking fruit and generally helping us do all the usual farm work",
"mt": "๋‹น์‹ ์€ ๊ณผ์ผ์„ ๋”ฐ๊ธฐ๋„ ํ•˜๊ณ  ๋Œ€์ฒด๋กœ ์šฐ๋ฆฌ๊ฐ€ ํ•˜๋Š” ์ผ์ƒ์ ์ธ ๋†์žฅ ์ผ์„ ๋•๊ฒŒ ๋  ๊ฒ๋‹ˆ๋‹ค",
},{
"src": "Youll be picking fruit and generally helping us do all the usual farm work",
"mt": "๋‹น์‹ ์€ ๊ณผ์ผ์„ ๋”ฐ๊ธฐ๋„ ํ•˜๊ณ  ๋Œ€์ฒด๋กœ ์šฐ๋ฆฌ๊ฐ€ ํ•˜๋Š” ์ผ์ƒ์ ์ธ ๋†์žฅ ์ผ์„ ๋•๊ฒŒ ๋  ๊ฒ๋‹ˆ๋‹ค",
},{
"src": "Youll be picking fruit and generally helping us do all the usual farm work",
"mt": "๋‹น์‹ ์€ ๊ณผ์ผ์„ ๋”ฐ๊ธฐ๋„ ํ•˜๊ณ  ๋Œ€์ฒด๋กœ ์šฐ๋ฆฌ๊ฐ€ ํ•˜๋Š” ์ผ์ƒ์ ์ธ ๋†์žฅ ์ผ์„ ๋•๊ฒŒ ๋  ๊ฒ๋‹ˆ๋‹ค",
},{
"src": "Youll be picking fruit and generally helping us do all the usual farm work",
"mt": "๋‹น์‹ ์€ ๊ณผ์ผ์„ ๋”ฐ๊ธฐ๋„ ํ•˜๊ณ  ๋Œ€์ฒด๋กœ ์šฐ๋ฆฌ๊ฐ€ ํ•˜๋Š” ์ผ์ƒ์ ์ธ ๋†์žฅ ์ผ์„ ๋•๊ฒŒ ๋  ๊ฒ๋‹ˆ๋‹ค",
},{
"src": "Youll be picking fruit and generally helping us do all the usual farm work",
"mt": "๋‹น์‹ ์€ ๊ณผ์ผ์„ ๋”ฐ๊ธฐ๋„ ํ•˜๊ณ  ๋Œ€์ฒด๋กœ ์šฐ๋ฆฌ๊ฐ€ ํ•˜๋Š” ์ผ์ƒ์ ์ธ ๋†์žฅ ์ผ์„ ๋•๊ฒŒ ๋  ๊ฒ๋‹ˆ๋‹ค",
},
]
}
}
scores = query(API_URL, headers, payload)
```
# Intended uses
Unbabel's model is intented to be used for **reference-free MT evaluation**.
Given a source text and its translation, outputs a single score between 0 and 1 where 1 represents a perfect translation.
# Languages Covered:
This model builds on top of InfoXLM which cover the following languages:
Afrikaans, Albanian, Amharic, Arabic, Armenian, Assamese, Azerbaijani, Basque, Belarusian, Bengali, Bengali Romanized, Bosnian, Breton, Bulgarian, Burmese, Burmese, Catalan, Chinese (Simplified), Chinese (Traditional), Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Filipino, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Hausa, Hebrew, Hindi, Hindi Romanized, Hungarian, Icelandic, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Kurdish (Kurmanji), Kyrgyz, Lao, Latin, Latvian, Lithuanian, Macedonian, Malagasy, Malay, Malayalam, Marathi, Mongolian, Nepali, Norwegian, Oriya, Oromo, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Sanskri, Scottish, Gaelic, Serbian, Sindhi, Sinhala, Slovak, Slovenian, Somali, Spanish, Sundanese, Swahili, Swedish, Tamil, Tamil Romanized, Telugu, Telugu Romanized, Thai, Turkish, Ukrainian, Urdu, Urdu Romanized, Uyghur, Uzbek, Vietnamese, Welsh, Western, Frisian, Xhosa, Yiddish.
Thus, results for language pairs containing uncovered languages are unreliable!