Commit
ยท
dae15a6
1
Parent(s):
97e69d3
Update README.md
Browse files
README.md
CHANGED
@@ -13,15 +13,33 @@ should probably proofread and complete it, then remove this comment. -->
|
|
13 |
|
14 |
# ke_t5_base_bongsoo_ko_en
|
15 |
|
16 |
-
This model is a fine-tuned version of [KETI-AIR/ke-t5-base](https://huggingface.co/KETI-AIR/ke-t5-base)
|
|
|
17 |
|
18 |
## Model description
|
19 |
|
20 |
-
|
|
|
|
|
|
|
|
|
|
|
21 |
|
22 |
## Intended uses & limitations
|
23 |
|
24 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
25 |
|
26 |
## Training and evaluation data
|
27 |
|
|
|
13 |
|
14 |
# ke_t5_base_bongsoo_ko_en
|
15 |
|
16 |
+
This model is a fine-tuned version of [KETI-AIR/ke-t5-base](https://huggingface.co/KETI-AIR/ke-t5-base)
|
17 |
+
on a [bongsoo/news_news_talk_en_ko](https://huggingface.co/datasets/bongsoo/news_talk_ko_en) dataset.
|
18 |
|
19 |
## Model description
|
20 |
|
21 |
+
KE-T5 is a pretrained-model of t5 text-to-text transfer transformers
|
22 |
+
using the Korean and English corpus developed by KETI (ํ๊ตญ์ ์์ฐ๊ตฌ์).
|
23 |
+
The vocabulary used by KE-T5 consists of 64,000 sub-word tokens
|
24 |
+
and was created using Google's sentencepiece.
|
25 |
+
The Sentencepiece model was trained to cover 99.95% of a 30GB corpus
|
26 |
+
with an approximate 7:3 mix of Korean and English.
|
27 |
|
28 |
## Intended uses & limitations
|
29 |
|
30 |
+
Translation from Korean to English
|
31 |
+
|
32 |
+
## Usage
|
33 |
+
|
34 |
+
You can use this model directly with a pipeline for translation language modeling:
|
35 |
+
|
36 |
+
```python
|
37 |
+
>>> from transformers import pipeline
|
38 |
+
>>> translator = pipeline('translation', model='chunwoolee0/ke_t5_base_bongsoo_en_ko')
|
39 |
+
|
40 |
+
>>> translator("Let us go for a walk after lunch.")
|
41 |
+
[{'translation_text': '์ ์ฌ์ ๋ง์น๊ณ ์ฐ์ฑ
์ ํ๋ฌ ๊ฐ์.'}]
|
42 |
+
|
43 |
|
44 |
## Training and evaluation data
|
45 |
|