|
Pretrained ELECTRA Language Model for Korean (bw-electra-base-discriminator) |
|
|
|
|
|
### Usage |
|
|
|
## Load Model and Tokenizer |
|
|
|
```python |
|
from transformers import ElectraModel,TFElectraModel,ElectraTokenizer |
|
# tensorflow |
|
model = TFElectraModel.from_pretrained("ifuseok/bw-electra-base-discriminator") |
|
# torch |
|
#model = ElectraModel.from_pretrained("ifuseok/bw-electra-base-discriminator",from_tf=True) |
|
tokenizer = ElectraTokenizer.from_pretrained("ifuseok/bw-electra-base-discriminator",do_lower) |
|
``` |
|
|
|
## Tokenizer example |
|
```python |
|
from transformers import ElectraTokenizer |
|
tokenizer = ElectraTokenizer.from_pretrained("ifuseok/bw-electra-base-discriminator") |
|
tokenizer.tokenize("[CLS] Big Wave ELECTRA ๋ชจ๋ธ์ ๊ณต๊ฐํฉ๋๋ค. [SEP]") |
|
``` |
|
|
|
## Example using ElectraForPreTraining(Torch) |
|
```python |
|
import torch |
|
from transformers import ElectraForPreTraining, ElectraTokenizer |
|
|
|
discriminator = ElectraForPreTraining.from_pretrained("ifuseok/bw-electra-base-discriminator",from_tf=True) |
|
tokenizer = ElectraTokenizer.from_pretrained("ifuseok/bw-electra-base-discriminator",do_lower_case=False) |
|
|
|
sentence = "์๋ฌด๊ฒ๋ ํ๊ธฐ๊ฐ ์ซ๋ค." |
|
fake_sentence = "์๋ฌด๊ฒ๋ ํ๊ธฐ๊ฐ ์ข๋ค." |
|
|
|
fake_tokens = tokenizer.tokenize(fake_sentence) |
|
fake_inputs = tokenizer.encode(fake_sentence, return_tensors="pt") |
|
|
|
discriminator_outputs = discriminator(fake_inputs) |
|
predictions = torch.round((torch.sign(discriminator_outputs[0]) + 1) / 2) |
|
|
|
print(list(zip(fake_tokens, predictions.tolist()[0][1:-1]))) |
|
``` |
|
|
|
## Example using ElectraForPreTraining(Tensorflow) |
|
```python |
|
import tensorflow as tf |
|
from transformers import TFElectraForPreTraining, ElectraTokenizer |
|
|
|
discriminator = TFElectraForPreTraining.from_pretrained("ifuseok/bw-electra-base-discriminator" ) |
|
tokenizer = ElectraTokenizer.from_pretrained("ifuseok/bw-electra-base-discriminator", use_auth_token=access_token |
|
,do_lower_case=False) |
|
|
|
sentence = "์๋ฌด๊ฒ๋ ํ๊ธฐ๊ฐ ์ซ๋ค." |
|
fake_sentence = "์๋ฌด๊ฒ๋ ํ๊ธฐ๊ฐ ์ข๋ค." |
|
|
|
fake_tokens = tokenizer.tokenize(fake_sentence) |
|
fake_inputs = tokenizer.encode(fake_sentence, return_tensors="tf") |
|
|
|
discriminator_outputs = discriminator(fake_inputs) |
|
predictions = tf.round((tf.sign(discriminator_outputs[0]) + 1)/2).numpy() |
|
|
|
print(list(zip(fake_tokens, predictions.tolist()[0][1:-1]))) |
|
|
|
``` |