Model Card for Model ID
Model Details
- Base Model: Pre-training
- Model Description: This model can be used for translation.
- Developed by: Platform Develop Div. at the 2Bytescorp Korea.
- Model Type: Translation
- Language(s):
- Source Language: English
- Target Language: Korean
Training Info
- Training Step/epoch: 400,000 steps
Dataset
- Train Dataset: 12,000,000
- Test Dataset: 1,000,000
- Valid Dataset: 1,000,000
Training Data
- dataset: Our own Korea/English dataset.
How to Get Started With the Model (Inference)
import ctranslate2
import pyonmttok
import sys
if len(sys.argv) < 2:
sentence = "I sincerely apologize for not providing the best taste and quality."
else:
sentence = sys.argv[1]
tokenizer = pyonmttok.Tokenizer("conservative", joiner_annotate=True)
tokens = tokenizer(sentence)
model = "/home/techops/data/nmt_data/clean_data_files_v1/ctranslate2/model_4m"
translator = ctranslate2.Translator(model_path=model, device="cpu")
outputs = translator.translate_batch([tokens], beam_size=5, num_hypotheses=2, sampling_temperature=0.8, replace_unknowns=True)
translated = outputs[0].hypotheses[0]
t_s = tokenizer.detokenize(translated)
print(t_s.replace("@@", ""))
>>>
(nmt) [techops@inf-ai-nmt-a01 (screen: ) /data/NMT/2b_nmt/ctranslate]$ python ctran_translate.py
์ต๊ณ ์ ๋ง๊ณผ ํ์ง์ ์ ๊ณตํ์ง ๋ชปํ ์ ์ ๋ํด ์ง์ฌ์ผ๋ก ์ฌ๊ณผ๋๋ฆฝ๋๋ค.