Ben Nguyen

Inference endpoint

7ffa8af over 1 year ago

5 kB

	---
	extra_gated_heading: Acknowledge license to accept the repository
	extra_gated_button_content: Acknowledge license
	pipeline_tag: translation
	language:
	- multilingual
	- af
	- am
	- ar
	- as
	- az
	- be
	- bg
	- bn
	- br
	- bs
	- ca
	- cs
	- cy
	- da
	- de
	- el
	- en
	- eo
	- es
	- et
	- eu
	- fa
	- fi
	- fr
	- fy
	- ga
	- gd
	- gl
	- gu
	- ha
	- he
	- hi
	- hr
	- hu
	- hy
	- id
	- is
	- it
	- ja
	- jv
	- ka
	- kk
	- km
	- kn
	- ko
	- ku
	- ky
	- la
	- lo
	- lt
	- lv
	- mg
	- mk
	- ml
	- mn
	- mr
	- ms
	- my
	- ne
	- nl
	- no
	- om
	- or
	- pa
	- pl
	- ps
	- pt
	- ro
	- ru
	- sa
	- sd
	- si
	- sk
	- sl
	- so
	- sq
	- sr
	- su
	- sv
	- sw
	- ta
	- te
	- th
	- tl
	- tr
	- ug
	- uk
	- ur
	- uz
	- vi
	- xh
	- yi
	- zh
	license: cc-by-nc-sa-4.0
	---

	This is a [COMET](https://github.com/Unbabel/COMET) quality estimation model by Unbabel: It receives a source sentence and the respective translation and returns a score that reflects the quality of the translation.

	# Paper

	[CometKiwi: IST-Unbabel 2022 Submission for the Quality Estimation Shared Task](https://aclanthology.org/2022.wmt-1.60) (Rei et al., WMT 2022)

	# License:

	cc-by-nc-sa-4.0

	# Usage for Inference Endpoint

	```python
	import json
	import requests

	API_URL = ""
	API_TOKEN="MY_API_KEY"
	headers = {
	"Authorization": f"Bearer {API_TOKEN}",
	"Content-Type": "application/json",
	}

	def query(url, headers, payload):
	data = json.dumps(payload)
	response = requests.request("POST", url, headers=headers, data=data)
	return json.loads(response.content.decode("utf-8"))

	payload = {
	"inputs": {
	"batch_size": 8,
	"workers": None,
	"data": [
	{
	"src": "Youll be picking fruit and generally helping us do all the usual farm work",
	"mt": "당신은 과일을 따기도 하고 대체로 우리가 하는 일상적인 농장 일을 돕게 될 겁니다",
	},{
	"src": "Youll be picking fruit and generally helping us do all the usual farm work",
	"mt": "당신은 과일을 따기도 하고 대체로 우리가 하는 일상적인 농장 일을 돕게 될 겁니다",
	},{
	"src": "Youll be picking fruit and generally helping us do all the usual farm work",
	"mt": "당신은 과일을 따기도 하고 대체로 우리가 하는 일상적인 농장 일을 돕게 될 겁니다",
	},{
	"src": "Youll be picking fruit and generally helping us do all the usual farm work",
	"mt": "당신은 과일을 따기도 하고 대체로 우리가 하는 일상적인 농장 일을 돕게 될 겁니다",
	},{
	"src": "Youll be picking fruit and generally helping us do all the usual farm work",
	"mt": "당신은 과일을 따기도 하고 대체로 우리가 하는 일상적인 농장 일을 돕게 될 겁니다",
	},{
	"src": "Youll be picking fruit and generally helping us do all the usual farm work",
	"mt": "당신은 과일을 따기도 하고 대체로 우리가 하는 일상적인 농장 일을 돕게 될 겁니다",
	},{
	"src": "Youll be picking fruit and generally helping us do all the usual farm work",
	"mt": "당신은 과일을 따기도 하고 대체로 우리가 하는 일상적인 농장 일을 돕게 될 겁니다",
	},
	]
	}
	}

	scores = query(API_URL, headers, payload)
	```

	# Intended uses

	Unbabel's model is intented to be used for reference-free MT evaluation.

	Given a source text and its translation, outputs a single score between 0 and 1 where 1 represents a perfect translation.

	# Languages Covered:

	This model builds on top of InfoXLM which cover the following languages:

	Afrikaans, Albanian, Amharic, Arabic, Armenian, Assamese, Azerbaijani, Basque, Belarusian, Bengali, Bengali Romanized, Bosnian, Breton, Bulgarian, Burmese, Burmese, Catalan, Chinese (Simplified), Chinese (Traditional), Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Filipino, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Hausa, Hebrew, Hindi, Hindi Romanized, Hungarian, Icelandic, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Khmer, Korean, Kurdish (Kurmanji), Kyrgyz, Lao, Latin, Latvian, Lithuanian, Macedonian, Malagasy, Malay, Malayalam, Marathi, Mongolian, Nepali, Norwegian, Oriya, Oromo, Pashto, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Sanskri, Scottish, Gaelic, Serbian, Sindhi, Sinhala, Slovak, Slovenian, Somali, Spanish, Sundanese, Swahili, Swedish, Tamil, Tamil Romanized, Telugu, Telugu Romanized, Thai, Turkish, Ukrainian, Urdu, Urdu Romanized, Uyghur, Uzbek, Vietnamese, Welsh, Western, Frisian, Xhosa, Yiddish.

	Thus, results for language pairs containing uncovered languages are unreliable!