This is a COMET quality estimation model by Unbabel: It receives a source sentence and the respective translation and returns a score that reflects the quality of the translation.
Paper
CometKiwi: IST-Unbabel 2022 Submission for the Quality Estimation Shared Task (Rei et al., WMT 2022)
License:
cc-by-nc-sa-4.0
Usage for Inference Endpoint
import json
import requests
API_URL = ""
API_TOKEN="MY_API_KEY"
headers = {
"Authorization": f"Bearer {API_TOKEN}",
"Content-Type": "application/json",
}
def query(url, headers, payload):
data = json.dumps(payload)
response = requests.request("POST", url, headers=headers, data=data)
return json.loads(response.content.decode("utf-8"))
payload = {
"inputs": {
"batch_size": 8,
"workers": None,
"data": [
{
"src": "Youll be picking fruit and generally helping us do all the usual farm work",
"mt": "๋น์ ์ ๊ณผ์ผ์ ๋ฐ๊ธฐ๋ ํ๊ณ ๋์ฒด๋ก ์ฐ๋ฆฌ๊ฐ ํ๋ ์ผ์์ ์ธ ๋์ฅ ์ผ์ ๋๊ฒ ๋ ๊ฒ๋๋ค",
},{
"src": "Youll be picking fruit and generally helping us do all the usual farm work",
"mt": "๋น์ ์ ๊ณผ์ผ์ ๋ฐ๊ธฐ๋ ํ๊ณ ๋์ฒด๋ก ์ฐ๋ฆฌ๊ฐ ํ๋ ์ผ์์ ์ธ ๋์ฅ ์ผ์ ๋๊ฒ ๋ ๊ฒ๋๋ค",
},{
"src": "Youll be picking fruit and generally helping us do all the usual farm work",
"mt": "๋น์ ์ ๊ณผ์ผ์ ๋ฐ๊ธฐ๋ ํ๊ณ ๋์ฒด๋ก ์ฐ๋ฆฌ๊ฐ ํ๋ ์ผ์์ ์ธ ๋์ฅ ์ผ์ ๋๊ฒ ๋ ๊ฒ๋๋ค",
},{
"src": "Youll be picking fruit and generally helping us do all the usual farm work",
"mt": "๋น์ ์ ๊ณผ์ผ์ ๋ฐ๊ธฐ๋ ํ๊ณ ๋์ฒด๋ก ์ฐ๋ฆฌ๊ฐ ํ๋ ์ผ์์ ์ธ ๋์ฅ ์ผ์ ๋๊ฒ ๋ ๊ฒ๋๋ค",
},{
"src": "Youll be picking fruit and generally helping us do all the usual farm work",
"mt": "๋น์ ์ ๊ณผ์ผ์ ๋ฐ๊ธฐ๋ ํ๊ณ ๋์ฒด๋ก ์ฐ๋ฆฌ๊ฐ ํ๋ ์ผ์์ ์ธ ๋์ฅ ์ผ์ ๋๊ฒ ๋ ๊ฒ๋๋ค",
},{
"src": "Youll be picking fruit and generally helping us do all the usual farm work",
"mt": "๋น์ ์ ๊ณผ์ผ์ ๋ฐ๊ธฐ๋ ํ๊ณ ๋์ฒด๋ก ์ฐ๋ฆฌ๊ฐ ํ๋ ์ผ์์ ์ธ ๋์ฅ ์ผ์ ๋๊ฒ ๋ ๊ฒ๋๋ค",
},{
"src": "Youll be picking fruit and generally helping us do all the usual farm work",
"mt": "๋น์ ์ ๊ณผ์ผ์ ๋ฐ๊ธฐ๋ ํ๊ณ ๋์ฒด๋ก ์ฐ๋ฆฌ๊ฐ ํ๋ ์ผ์์ ์ธ ๋์ฅ ์ผ์ ๋๊ฒ ๋ ๊ฒ๋๋ค",
},
]
}
}
scores = query(API_URL, headers, payload)
Intended uses
Unbabel's model is intented to be used for reference-free MT evaluation.
Given a source text and its translation, outputs a single score between 0 and 1 where 1 represents a perfect translation.
Languages Covered:
This model builds on top of InfoXLM which cover the following languages:
Afrikaans, Albanian, Amharic, Arabic, Armenian, Assamese, Azerbaijani, Basque, Belarusian, Bengali, Bengali Romanized, Bosnian, Breton, Bulgarian, Burmese, Burmese, Catalan, Chinese (Simplified), Chinese (Traditional), Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Filipino, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Hausa, Hebrew, Hindi, Hindi Romanized, Hungarian, Icelandic, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Kurdish (Kurmanji), Kyrgyz, Lao, Latin, Latvian, Lithuanian, Macedonian, Malagasy, Malay, Malayalam, Marathi, Mongolian, Nepali, Norwegian, Oriya, Oromo, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Sanskri, Scottish, Gaelic, Serbian, Sindhi, Sinhala, Slovak, Slovenian, Somali, Spanish, Sundanese, Swahili, Swedish, Tamil, Tamil Romanized, Telugu, Telugu Romanized, Thai, Turkish, Ukrainian, Urdu, Urdu Romanized, Uyghur, Uzbek, Vietnamese, Welsh, Western, Frisian, Xhosa, Yiddish.
Thus, results for language pairs containing uncovered languages are unreliable!