lisaterumi's picture
Update README.md
7eb8862
metadata
language: en
widget:
  - text: >-
      On the other hand, a decline of the arsenic content in hair and nail was
      observed after withdrawal of the drug.
  - text: These differences in gene expression have not been molecularly defined.
  - text: p65 was detected in the cytoplasm of FDC , whereas nuclei were negative.
  - text: These differences in gene expression have not been molecularly defined.
datasets:
  - Genia

A Biomedical Pos-Tagger for English

Trained with the GENIA corpus.

Eval:

precision    recall  f1-score   support

           0       0.98      1.00      0.99       263
           3       0.93      1.00      0.97        14
           5       1.00      1.00      1.00         8
           6       0.99      0.99      0.99       169
           7       1.00      1.00      1.00       203
           8       0.99      1.00      1.00       195
           9       0.95      0.78      0.85        98
          10       0.83      1.00      0.91         5
          11       0.96      0.97      0.96       532
          12       1.00      1.00      1.00       252
          13       0.99      0.98      0.99      1575
          14       0.95      0.95      0.95       133
          15       0.89      0.89      0.89         9
          16       1.00      1.00      1.00         3
          18       0.99      1.00      0.99        69
          19       1.00      0.95      0.98        22
          20       0.99      1.00      1.00       395
          22       1.00      1.00      1.00      1328
          23       1.00      1.00      1.00       987
          24       1.00      1.00      1.00         6
          25       0.00      0.00      0.00         0
          26       1.00      1.00      1.00       620
          27       0.00      0.00      0.00         1
          28       1.00      1.00      1.00        39
          29       0.98      0.99      0.98      5674
          30       0.97      0.96      0.96      2075
          31       1.00      0.71      0.83         7
          32       1.00      0.80      0.89         5
          33       1.00      1.00      1.00        58
          34       1.00      1.00      1.00         2
          35       0.96      0.96      0.96       336
          37       0.99      1.00      1.00      1579
          38       1.00      1.00      1.00      1446
          39       1.00      0.98      0.99        57

    accuracy                           0.99     18165
   macro avg       0.92      0.91      0.91     18165
weighted avg       0.99      0.99      0.99     18165

F1:  0.985267446136761 Accuracy:  0.9853564547206166

Tags:

{0: 'VBD',
 1: 'N',
 2: 'XT',
 3: 'JJS',
 4: 'E2A',
 5: 'WRB',
 6: 'VB',
 7: 'TO',
 8: 'VBP',
 9: 'FW',
 10: 'EX',
 11: 'VBN',
 12: 'VBZ',
 13: 'NNS',
 14: 'VBG',
 15: 'RBR',
 16: 'WP',
 17: 'CT',
 18: 'PRP',
 19: 'JJR',
 20: 'CC',
 21: 'NNPS',
 22: 'CD',
 23: 'DT',
 24: 'NNP',
 25: 'PDT',
 26: 'LS',
 27: 'PP',
 28: 'PRP$',
 29: 'NN',
 30: 'JJ',
 31: 'RP',
 32: 'RBS',
 33: 'MD',
 34: 'WP$',
 35: 'RB',
 36: 'SYM',
 37: 'IN',
 38: 'PUNCT',
 39: 'WDT',
 40: 'POS',
 41: '<pad>'}

Parameters:

nepochs = 30 (stop at 18th)
batch_size = 32
batch_status = 32
learning_rate = 1e-5
early_stop = 3
max_length = 200
checkpoint: dmis-lab/biobert-base-cased-v1.2

See more in: https://github.com/lisaterumi/postagger-bio-english