|
--- |
|
license: cc-by-nc-nd-4.0 |
|
language: ko |
|
tags: |
|
- KakaoBrain |
|
- KoGPT |
|
- GPT |
|
- GPT3 |
|
--- |
|
|
|
# KakaoBrain project KoGPT |
|
|
|
KakaoBrain's Pre-Trained Language Models. |
|
|
|
* KakaoBrain project KoGPT (Korean Generative Pre-trained Transformer) |
|
* [https://github.com/kakaobrain/kogpt](https://github.com/kakaobrain/kogpt) |
|
* [https://huggingface.co./kakaobrain/kogpt](https://huggingface.co./kakaobrain/kogpt) |
|
|
|
|
|
## Model Descriptions |
|
|
|
### KoGPT6B-ryan1.5b |
|
|
|
* [\[huggingface\]\[kakaobrain/kogpt\]\[KoGPT6B-ryan1.5b\]](https://huggingface.co./kakaobrain/kogpt/tree/KoGPT6B-ryan1.5b) |
|
* [\[huggingface\]\[kakaobrain/kogpt\]\[KoGPT6B-ryan1.5b-float16\]](https://huggingface.co./kakaobrain/kogpt/tree/KoGPT6B-ryan1.5b-float16) |
|
|
|
| Hyperparameter | Value | |
|
|:---------------------|--------------:| |
|
| \\(n_{parameters}\\) | 6,166,502,400 | |
|
| \\(n_{layers}\\) | 28 | |
|
| \\(d_{model}\\) | 4,096 | |
|
| \\(d_{ff}\\) | 16,384 | |
|
| \\(n_{heads}\\) | 16 | |
|
| \\(d_{head}\\) | 256 | |
|
| \\(n_{ctx}\\) | 2,048 | |
|
| \\(n_{vocab}\\) | 64,512 | |
|
| Positional Encoding | [Rotary Position Embedding (RoPE)](https://arxiv.org/abs/2104.09864) | |
|
| RoPE Dimensions | 64 | |
|
|
|
|
|
## Hardware requirements |
|
|
|
### KoGPT6B-ryan1.5b |
|
|
|
#### GPU |
|
The following is the recommended minimum GPU hardware guidance for a handful of example KoGPT. |
|
* `32GB GPU RAM` in the required minimum memory size |
|
|
|
### KoGPT6B-ryan1.5b-float16 |
|
|
|
#### GPU |
|
The following is the recommended minimum GPU hardware guidance for a handful of example KoGPT. |
|
* half-precision requires NVIDIA GPUS based on Volta, Turing or Ampere |
|
* `16GB GPU RAM` in the required minimum memory size |
|
|
|
|
|
## Usage |
|
|
|
### prompt |
|
```bash |
|
python -m kogpt --help |
|
usage: KoGPT inference [-h] [--model MODEL] [--revision {KoGPT6B-ryan1.5b}] |
|
[--device {cpu,cuda}] [-d] |
|
|
|
KakaoBrain Korean(hangul) Generative Pre-Training Model |
|
|
|
optional arguments: |
|
-h, --help show this help message and exit |
|
--model MODEL huggingface repo (default:kakaobrain/kogpt) |
|
--revision {KoGPT6B-ryan1.5b} |
|
--device {cpu,cuda} (default:cuda) |
|
-d, --debug |
|
``` |
|
|
|
```bash |
|
python -m kogpt |
|
prompt> μΈκ°μ²λΌ μκ°νκ³ , νλνλ 'μ§λ₯'μ ν΅ν΄ μΈλ₯κ° μ΄μ κΉμ§ νμ§ λͺ»νλ |
|
temperature(0.8)> |
|
max_length(128)> 64 |
|
μΈκ°μ²λΌ μκ°νκ³ , νλνλ 'μ§λ₯'μ ν΅ν΄ μΈλ₯κ° μ΄μ κΉμ§ νμ§ λͺ»νλ λ¬Έμ μ ν΄λ΅μ μ°Ύμ μ μμ κ²μ΄λ€. κ³ΌνκΈ°μ μ΄ κ³ λλ‘ λ°λ¬ν 21μΈκΈ°λ₯Ό μ΄μκ° μ°λ¦¬ μμ΄λ€μκ² κ°μ₯ νμν κ²μ μ¬κ³ λ ₯ νλ ¨μ΄λ€. μ¬κ³ λ ₯ νλ ¨μ ν΅ν΄, μΈμ |
|
|
|
prompt> |
|
... |
|
``` |
|
|
|
|
|
### python |
|
```python |
|
import torch |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
tokenizer = AutoTokenizer.from_pretrained( |
|
'kakaobrain/kogpt', revision='KoGPT6B-ryan1.5b-float16', # or float32 version: revision=KoGPT6B-ryan1.5b |
|
bos_token='[BOS]', eos_token='[EOS]', unk_token='[UNK]', pad_token='[PAD]', mask_token='[MASK]' |
|
) |
|
model = AutoModelForCausalLM.from_pretrained( |
|
'kakaobrain/kogpt', revision='KoGPT6B-ryan1.5b-float16', # or float32 version: revision=KoGPT6B-ryan1.5b |
|
pad_token_id=tokenizer.eos_token_id, |
|
torch_dtype='auto', low_cpu_mem_usage=True |
|
).to(device='cuda', non_blocking=True) |
|
_ = model.eval() |
|
|
|
prompt = 'μΈκ°μ²λΌ μκ°νκ³ , νλνλ \'μ§λ₯\'μ ν΅ν΄ μΈλ₯κ° μ΄μ κΉμ§ νμ§ λͺ»νλ' |
|
with torch.no_grad(): |
|
tokens = tokenizer.encode(prompt, return_tensors='pt').to(device='cuda', non_blocking=True) |
|
gen_tokens = model.generate(tokens, do_sample=True, temperature=0.8, max_length=64) |
|
generated = tokenizer.batch_decode(gen_tokens)[0] |
|
|
|
print(generated) # print: μΈκ°μ²λΌ μκ°νκ³ , νλνλ 'μ§λ₯'μ ν΅ν΄ μΈλ₯κ° μ΄μ κΉμ§ νμ§ λͺ»νλ λ¬Έμ μ ν΄λ΅μ μ°Ύμ μ μμ κ²μ΄λ€. κ³ΌνκΈ°μ μ΄ κ³ λλ‘ λ°λ¬ν 21μΈκΈ°λ₯Ό μ΄μκ° μ°λ¦¬ μμ΄λ€μκ² κ°μ₯ νμν κ²μ μ¬κ³ λ ₯ νλ ¨μ΄λ€. μ¬κ³ λ ₯ νλ ¨μ ν΅ν΄, μΈμ |
|
``` |
|
|
|
|
|
## Experiments |
|
|
|
### In-context Few-Shots |
|
|
|
| Models | #params | NSMC (Acc.) | YNAT (F1) | KLUE-STS (F1) | |
|
|:--------------|--------:|------------:|----------:|--------------:| |
|
| HyperCLOVA[1] | 1.3B | 83.9 | 58.7 | 60.9 | |
|
| HyperCLOVA[1] | 6.9B | 83.8 | 67.5 | 59.3 | |
|
| HyperCLOVA[1] | 13.0B | 87.9 | 67.9 | 60.0 | |
|
| HyperCLOVA[1] | 39.0B | 88.0 | 71.4 | 61.6 | |
|
| HyperCLOVA[1] | 82.0B | **88.2** | 72.7 | **65.1** | |
|
| **Ours** | 6.0B | 87.8 | **78.0** | 64.3 | |
|
|
|
|
|
### Finetuning / P-Tuning |
|
|
|
|
|
We have been reported to have issues(https://github.com/kakaobrain/kogpt/issues/17) with our downstream evaluation. |
|
|
|
The previously published performance evaluation table was deleted because it was difficult to see it as a fair comparison because the comparison target algorithm was different and the performance measurement method could not be confirmed. |
|
|
|
You can refer to the above issue link for the existing performance evaluation table and troubleshooting results. |
|
|
|
|
|
|
|
## Limitations |
|
|
|
KakaoBrain `KoGPT` was trained on `ryan dataset`, a dataset known to contain profanity, lewd, political changed, and other harsh language. |
|
Therefore, `KoGPT` can generate socially unacceptable texts. As with all language models, It is difficult to predict in advance how `KoGPT` will response to particular prompts and offensive content without warning. |
|
|
|
Primarily Korean: `KoGPT` is primarily trained on Korean texts, and is best for classifying, searching, summarizing or generating such texts. |
|
`KoGPT` by default perform worse on inputs that are different from the data distribution it is trained on, including non-Korean as well as specific dialects of Korean that are not well represented in the training data. |
|
|
|
[comment]: <> (If abnormal or socially unacceptable text is generated during testing, please send a "prompt" and the "generated text" to [[email protected]](mailto:[email protected]). ) |
|
|
|
|
|
|
|
μΉ΄μΉ΄μ€λΈλ μΈ `KoGPT`λ μμ€, μλ, μ μΉμ λ΄μ© λ° κΈ°ν κ±°μΉ μΈμ΄μ λν μ²λ¦¬λ₯Ό νμ§ μμ `ryan dataset`μΌλ‘ νμ΅νμμ΅λλ€. |
|
λ°λΌμ `KoGPT`λ μ¬νμ μΌλ‘ μ©μΈλμ§ μμ ν
μ€νΈλ₯Ό μμ±ν μ μμ΅λλ€. λ€λ₯Έ μΈμ΄ λͺ¨λΈκ³Ό λ§μ°¬κ°μ§λ‘ νΉμ ν둬ννΈμ 곡격μ μΈ μ½ν
μΈ μ μ΄λ ν κ²°κ³Όλ₯Ό μμ±ν μ§ μ¬μ μ νμ
νκΈ° μ΄λ ΅μ΅λλ€. |
|
|
|
`KoGPT`λ μ£Όλ‘ νκ΅μ΄ ν
μ€νΈλ‘ νμ΅μ νμμΌλ©° μ΄λ¬ν ν
μ€νΈλ₯Ό λΆλ₯, κ²μ, μμ½ λλ μμ±νλλ° κ°μ₯ μ ν©ν©λλ€. |
|
κΈ°λ³Έμ μΌλ‘ `KoGPT`λ νμ΅ λ°μ΄ν°μ μ λνλμ§ μλ λ°©μΈλΏλ§μλλΌ νκ΅μ΄κ° μλ κ²½μ°μ κ°μ΄ νμ΅ λ°μ΄ν°μμ λ°κ²¬νκΈ° μ΄λ €μ΄ μ
λ ₯μμ μ’μ§ μμ μ±λ₯μ 보μ
λλ€. |
|
|
|
[comment]: <> (ν
μ€νΈμ€μ λ°μν λΉμ μμ μΈ νΉμ μ¬νμ μΌλ‘ μ©μΈλμ§ μλ ν
μ€νΈκ° μμ±λ κ²½μ° [[email protected]](mailto:[email protected])λ‘ "prompt"μ "μμ±λ λ¬Έμ₯"μ ν¨κ» 보λ΄μ£ΌμκΈ° λ°λλλ€.) |
|
|
|
|
|
## Citation |
|
|
|
If you apply this library or model to any project and research, please cite our code: |
|
|
|
``` |
|
@misc{kakaobrain2021kogpt, |
|
title = {KoGPT: KakaoBrain Korean(hangul) Generative Pre-trained Transformer}, |
|
author = {Ildoo Kim and Gunsoo Han and Jiyeon Ham and Woonhyuk Baek}, |
|
year = {2021}, |
|
howpublished = {\url{https://github.com/kakaobrain/kogpt}}, |
|
} |
|
``` |
|
|
|
|
|
## Contact |
|
|
|
This is released as an open source in the hope that it will be helpful to many research institutes and startups for research purposes. We look forward to contacting us from various places who wish to cooperate with us. |
|
|
|
[[email protected]](mailto:[email protected]) |
|
|
|
|
|
## License |
|
|
|
The `source code` of KakaoBrain `KoGPT` are licensed under [Apache 2.0](LICENSE.apache-2.0) License. |
|
The `pretrained wieghts` of KakaoBrain `KoGPT` are licensed under [CC-BY-NC-ND 4.0 License](https://creativecommons.org/licenses/by-nc-nd/4.0/) License. |
|
|
|
μΉ΄μΉ΄μ€λΈλ μΈ `KoGPT`μ `μμ€μ½λ(source code)`λ [Apache 2.0](LICENSE.apache-2.0) λΌμ΄μ μ€ νμ 곡κ°λμ΄ μμ΅λλ€. |
|
μΉ΄μΉ΄μ€λΈλ μΈ `KoGPT`μ `μ¬μ νμ΅λ κ°μ€μΉ(pretrained weights)`λ [CC-BY-NC-ND 4.0 λΌμ΄μ μ€](https://creativecommons.org/licenses/by-nc-nd/4.0/) λΌμ΄μ μ€ νμ 곡κ°λμ΄ μμ΅λλ€. |
|
λͺ¨λΈ λ° μ½λ, μ¬μ νμ΅λ κ°μ€μΉλ₯Ό μ¬μ©ν κ²½μ° λΌμ΄μ μ€ λ΄μ©μ μ€μν΄ μ£Όμμμ€. λΌμ΄μ μ€ μ λ¬Έμ [Apache 2.0](LICENSE.apache-2.0), [LICENSE.cc-by-nc-nd-4.0](LICENSE.cc-by-nc-nd-4.0) νμΌμμ νμΈνμ€ μ μμ΅λλ€. |
|
|
|
|
|
## References |
|
|
|
[1] [HyperCLOVA](https://arxiv.org/abs/2109.04650): Kim, Boseop, et al. "What changes can large-scale language models bring? intensive study on hyperclova: Billions-scale korean generative pretrained transformers." arXiv preprint arXiv:2109.04650 (2021). |
|
|