File size: 488 Bytes
fb4b5f2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 |
---
license: gemma
---
Made using Gpt-Small from scratch for learning purpose.
Tokenizer used is from Gemma 2-2B-JPN-IT which is trained on japanese dataset from JESC.
```bibtex
@ARTICLE{pryzant_jesc_2018,
author = {{Pryzant}, R. and {Chung}, Y. and {Jurafsky}, D. and {Britz}, D.},
title = "{JESC: Japanese-English Subtitle Corpus}",
journal = {Language Resources and Evaluation Conference (LREC)},
keywords = {Computer Science - Computation and Language},
year = 2018
} |