oroszgy commited on
Commit
d3769bc
1 Parent(s): be921c0

Update spacy pipeline to 3.3.1

Browse files
README.md CHANGED
@@ -14,72 +14,72 @@ model-index:
14
  metrics:
15
  - name: NER Precision
16
  type: precision
17
- value: 0.8607281725
18
  - name: NER Recall
19
  type: recall
20
- value: 0.8561884669
21
  - name: NER F Score
22
  type: f_score
23
- value: 0.858452318
24
  - task:
25
  name: TAG
26
  type: token-classification
27
  metrics:
28
  - name: TAG (XPOS) Accuracy
29
  type: accuracy
30
- value: 0.9638738696
31
  - task:
32
  name: POS
33
  type: token-classification
34
  metrics:
35
  - name: POS (UPOS) Accuracy
36
  type: accuracy
37
- value: 0.9649265515
38
  - task:
39
  name: MORPH
40
  type: token-classification
41
  metrics:
42
  - name: Morph (UFeats) Accuracy
43
  type: accuracy
44
- value: 0.9324337257
45
  - task:
46
  name: LEMMA
47
  type: token-classification
48
  metrics:
49
  - name: Lemma Accuracy
50
  type: accuracy
51
- value: 0.9681370204
52
  - task:
53
  name: UNLABELED_DEPENDENCIES
54
  type: token-classification
55
  metrics:
56
  - name: Unlabeled Attachment Score (UAS)
57
  type: f_score
58
- value: 0.8211578415
59
  - task:
60
  name: LABELED_DEPENDENCIES
61
  type: token-classification
62
  metrics:
63
  - name: Labeled Attachment Score (LAS)
64
  type: f_score
65
- value: 0.7476698484
66
  - task:
67
  name: SENTS
68
  type: token-classification
69
  metrics:
70
  - name: Sentences F-Score
71
  type: f_score
72
- value: 0.9788182832
73
  ---
74
  Core Hungarian model for HuSpaCy. Components: tok2vec, senter, tagger, morphologizer, lemmatizer, parser, ner
75
 
76
  | Feature | Description |
77
  | --- | --- |
78
  | **Name** | `hu_core_news_lg` |
79
- | **Version** | `3.3.0` |
80
  | **spaCy** | `>=3.3.0,<3.4.0` |
81
- | **Default Pipeline** | `tok2vec`, `senter`, `tagger`, `morphologizer`, `lemmatizer`, `parser`, `ner` |
82
- | **Components** | `tok2vec`, `senter`, `tagger`, `morphologizer`, `lemmatizer`, `parser`, `ner` |
83
  | **Vectors** | -1 keys, 200000 unique vectors (300 dimensions) |
84
  | **Sources** | [UD Hungarian Szeged](https://universaldependencies.org/treebanks/hu_szeged/index.html) (Richárd Farkas, Katalin Simkó, Zsolt Szántó, Viktor Varga, Veronika Vincze (MTA-SZTE Research Group on Artificial Intelligence))<br />[NYTK-NerKor Corpus](https://github.com/nytud/NYTK-NerKor) (Eszter Simon, Noémi Vadász (Department of Language Technology and Applied Linguistics))<br />[hunNERwiki](http://hlt.sztaki.hu/resources/hunnerwiki.html) (Eszter Simon, Dávid Márk Nemeskey (HLT Group, Budapest University of Technology and Economics))<br />[Szeged NER Corpus](https://rgai.inf.u-szeged.hu/node/130) (György Szarvas, Richárd Farkas, László Felföldi, András Kocsor, János Csirik (MTA-SZTE Research Group on Artificial Intelligence))<br />[Webcorpuswiki word2vec model](https://github.com/oroszgy/hunlp-resources/releases/tag/webcorpuswiki_word2vec_v0.1) (György Orosz) |
85
  | **License** | `cc-by-sa-4.0` |
@@ -108,18 +108,18 @@ Core Hungarian model for HuSpaCy. Components: tok2vec, senter, tagger, morpholog
108
  | `TOKEN_P` | 99.86 |
109
  | `TOKEN_R` | 99.93 |
110
  | `TOKEN_F` | 99.89 |
111
- | `SENTS_P` | 97.99 |
112
- | `SENTS_R` | 97.77 |
113
- | `SENTS_F` | 97.88 |
114
- | `TAG_ACC` | 96.39 |
115
- | `POS_ACC` | 96.49 |
116
- | `MORPH_ACC` | 93.24 |
117
- | `MORPH_MICRO_P` | 96.86 |
118
- | `MORPH_MICRO_R` | 95.85 |
119
- | `MORPH_MICRO_F` | 96.35 |
120
- | `LEMMA_ACC` | 96.81 |
121
- | `DEP_UAS` | 82.12 |
122
- | `DEP_LAS` | 74.77 |
123
- | `ENTS_P` | 86.07 |
124
- | `ENTS_R` | 85.62 |
125
- | `ENTS_F` | 85.85 |
 
14
  metrics:
15
  - name: NER Precision
16
  type: precision
17
+ value: 0.847826087
18
  - name: NER Recall
19
  type: recall
20
+ value: 0.8570675105
21
  - name: NER F Score
22
  type: f_score
23
+ value: 0.8524217521
24
  - task:
25
  name: TAG
26
  type: token-classification
27
  metrics:
28
  - name: TAG (XPOS) Accuracy
29
  type: accuracy
30
+ value: 0.9677018039
31
  - task:
32
  name: POS
33
  type: token-classification
34
  metrics:
35
  - name: POS (UPOS) Accuracy
36
  type: accuracy
37
+ value: 0.9675104072
38
  - task:
39
  name: MORPH
40
  type: token-classification
41
  metrics:
42
  - name: Morph (UFeats) Accuracy
43
  type: accuracy
44
+ value: 0.9386544167
45
  - task:
46
  name: LEMMA
47
  type: token-classification
48
  metrics:
49
  - name: Lemma Accuracy
50
  type: accuracy
51
+ value: 0.9716773514
52
  - task:
53
  name: UNLABELED_DEPENDENCIES
54
  type: token-classification
55
  metrics:
56
  - name: Unlabeled Attachment Score (UAS)
57
  type: f_score
58
+ value: 0.8069939475
59
  - task:
60
  name: LABELED_DEPENDENCIES
61
  type: token-classification
62
  metrics:
63
  - name: Labeled Attachment Score (LAS)
64
  type: f_score
65
+ value: 0.736004483
66
  - task:
67
  name: SENTS
68
  type: token-classification
69
  metrics:
70
  - name: Sentences F-Score
71
  type: f_score
72
+ value: 0.9821029083
73
  ---
74
  Core Hungarian model for HuSpaCy. Components: tok2vec, senter, tagger, morphologizer, lemmatizer, parser, ner
75
 
76
  | Feature | Description |
77
  | --- | --- |
78
  | **Name** | `hu_core_news_lg` |
79
+ | **Version** | `3.3.1` |
80
  | **spaCy** | `>=3.3.0,<3.4.0` |
81
+ | **Default Pipeline** | `tok2vec`, `senter`, `tagger`, `morphologizer`, `lookup_lemmatizer`, `lemmatizer`, `lemma_smoother`, `parser`, `ner` |
82
+ | **Components** | `tok2vec`, `senter`, `tagger`, `morphologizer`, `lookup_lemmatizer`, `lemmatizer`, `lemma_smoother`, `parser`, `ner` |
83
  | **Vectors** | -1 keys, 200000 unique vectors (300 dimensions) |
84
  | **Sources** | [UD Hungarian Szeged](https://universaldependencies.org/treebanks/hu_szeged/index.html) (Richárd Farkas, Katalin Simkó, Zsolt Szántó, Viktor Varga, Veronika Vincze (MTA-SZTE Research Group on Artificial Intelligence))<br />[NYTK-NerKor Corpus](https://github.com/nytud/NYTK-NerKor) (Eszter Simon, Noémi Vadász (Department of Language Technology and Applied Linguistics))<br />[hunNERwiki](http://hlt.sztaki.hu/resources/hunnerwiki.html) (Eszter Simon, Dávid Márk Nemeskey (HLT Group, Budapest University of Technology and Economics))<br />[Szeged NER Corpus](https://rgai.inf.u-szeged.hu/node/130) (György Szarvas, Richárd Farkas, László Felföldi, András Kocsor, János Csirik (MTA-SZTE Research Group on Artificial Intelligence))<br />[Webcorpuswiki word2vec model](https://github.com/oroszgy/hunlp-resources/releases/tag/webcorpuswiki_word2vec_v0.1) (György Orosz) |
85
  | **License** | `cc-by-sa-4.0` |
 
108
  | `TOKEN_P` | 99.86 |
109
  | `TOKEN_R` | 99.93 |
110
  | `TOKEN_F` | 99.89 |
111
+ | `SENTS_P` | 97.77 |
112
+ | `SENTS_R` | 97.55 |
113
+ | `SENTS_F` | 97.66 |
114
+ | `TAG_ACC` | 96.31 |
115
+ | `POS_ACC` | 96.34 |
116
+ | `MORPH_ACC` | 92.89 |
117
+ | `MORPH_MICRO_P` | 96.28 |
118
+ | `MORPH_MICRO_R` | 95.58 |
119
+ | `MORPH_MICRO_F` | 95.93 |
120
+ | `LEMMA_ACC` | 97.25 |
121
+ | `DEP_UAS` | 81.13 |
122
+ | `DEP_LAS` | 74.49 |
123
+ | `ENTS_P` | 87.15 |
124
+ | `ENTS_R` | 83.72 |
125
+ | `ENTS_F` | 85.40 |
config.cfg CHANGED
@@ -1,6 +1,7 @@
1
  [paths]
2
- parser_model = "models/hu_core_news_lg-parser-3.3.0/model-best"
3
- ner_model = "models/hu_core_news_lg-ner-3.3.0/model-best"
 
4
  train = null
5
  dev = null
6
  vectors = null
@@ -12,7 +13,7 @@ gpu_allocator = null
12
 
13
  [nlp]
14
  lang = "hu"
15
- pipeline = ["tok2vec","senter","tagger","morphologizer","lemmatizer","parser","ner"]
16
  tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}
17
  disabled = []
18
  before_creation = null
@@ -22,6 +23,9 @@ batch_size = 1000
22
 
23
  [components]
24
 
 
 
 
25
  [components.lemmatizer]
26
  factory = "trainable_lemmatizer"
27
  backoff = "orth"
@@ -51,6 +55,11 @@ depth = 4
51
  window_size = 2
52
  maxout_pieces = 5
53
 
 
 
 
 
 
54
  [components.morphologizer]
55
  factory = "morphologizer"
56
  extend = false
 
1
  [paths]
2
+ parser_model = "models/hu_core_news_lg-parser-3.3.1/model-best"
3
+ ner_model = "models/hu_core_news_lg-ner-3.3.1/model-best"
4
+ lemmatizer_lookups = "models/hu_core_news_lg-lookup-lemmatizer-3.3.1"
5
  train = null
6
  dev = null
7
  vectors = null
 
13
 
14
  [nlp]
15
  lang = "hu"
16
+ pipeline = ["tok2vec","senter","tagger","morphologizer","lookup_lemmatizer","lemmatizer","lemma_smoother","parser","ner"]
17
  tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}
18
  disabled = []
19
  before_creation = null
 
23
 
24
  [components]
25
 
26
+ [components.lemma_smoother]
27
+ factory = "hu.lemma_smoother"
28
+
29
  [components.lemmatizer]
30
  factory = "trainable_lemmatizer"
31
  backoff = "orth"
 
55
  window_size = 2
56
  maxout_pieces = 5
57
 
58
+ [components.lookup_lemmatizer]
59
+ factory = "hu.lookup_lemmatizer"
60
+ scorer = {"@scorers":"spacy.lemmatizer_scorer.v1"}
61
+ source = ${paths.lemmatizer_lookups}
62
+
63
  [components.morphologizer]
64
  factory = "morphologizer"
65
  extend = false
hu_core_news_lg-any-py3-none-any.whl CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:6823f3f9d1f5036dcff94d1a5ba074155da18ba6f1a4e8dc6c5c124d9daff3d2
3
- size 401139364
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:da2b5830c7953203b64049813cbb0f4a181aa47be42744aa800af2fe3a22f0aa
3
+ size 403094603
lemmatizer/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f8a0dd8dc58167511bbcf3998dc1eea156915c72cdbe57f070ab1dabbf1a9b6b
3
  size 64058360
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:58b56ab3836952a8e612a411c2a1958630e46569225dce1cabb138e7fdc17658
3
  size 64058360
lookup_lemmatizer/lookups.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c7aaa1cfd45a0afd57ada8ccc690ee182485a151bdb133b7b8c9276647ab9e60
3
+ size 2745978
meta.json CHANGED
@@ -1,7 +1,7 @@
1
  {
2
  "lang":"hu",
3
  "name":"core_news_lg",
4
- "version":"3.3.0",
5
  "description":"Core Hungarian model for HuSpaCy. Components: tok2vec, senter, tagger, morphologizer, lemmatizer, parser, ner",
6
  "author":"SzegedAI, MILAB",
7
  "email":"[email protected]",
@@ -1188,6 +1188,9 @@
1188
  "Case=Dat|Number=Plur|POS=PRON|Person=1|PronType=Prs",
1189
  "Case=Acc|Number=Plur|Number[psor]=Sing|POS=PROPN|Person[psor]=3",
1190
  "Case=All|Number=Sing|Number[psed]=Sing|POS=PRON|Person=3|PronType=Tot"
 
 
 
1191
  ],
1192
  "parser":[
1193
  "ROOT",
@@ -1246,7 +1249,9 @@
1246
  "senter",
1247
  "tagger",
1248
  "morphologizer",
 
1249
  "lemmatizer",
 
1250
  "parser",
1251
  "ner"
1252
  ],
@@ -1255,7 +1260,9 @@
1255
  "senter",
1256
  "tagger",
1257
  "morphologizer",
 
1258
  "lemmatizer",
 
1259
  "parser",
1260
  "ner"
1261
  ],
@@ -1267,85 +1274,85 @@
1267
  "token_p":0.998565417,
1268
  "token_r":0.9993300153,
1269
  "token_f":0.9989475698,
1270
- "sents_p":0.9799107143,
1271
  "sents_r":0.9777282851,
1272
- "sents_f":0.9788182832,
1273
- "tag_acc":0.9638738696,
1274
- "pos_acc":0.9649265515,
1275
- "morph_acc":0.9324337257,
1276
- "morph_micro_p":0.9686020758,
1277
- "morph_micro_r":0.9584873227,
1278
- "morph_micro_f":0.9635181545,
1279
  "morph_per_feat":{
1280
  "Definite":{
1281
- "p":0.9638278388,
1282
- "r":0.9822678488,
1283
- "f":0.9729604807
1284
  },
1285
  "PronType":{
1286
- "p":0.9714128642,
1287
- "r":0.9751655629,
1288
- "f":0.9732855963
1289
  },
1290
  "Case":{
1291
- "p":0.9724,
1292
- "r":0.9606797076,
1293
- "f":0.9665043236
1294
  },
1295
  "Degree":{
1296
- "p":0.9179170344,
1297
- "r":0.8652246256,
1298
- "f":0.8907922912
1299
  },
1300
  "Number":{
1301
- "p":0.9863060017,
1302
- "r":0.9777107424,
1303
- "f":0.981989564
1304
  },
1305
  "Mood":{
1306
- "p":0.9364035088,
1307
- "r":0.9467849224,
1308
- "f":0.9415656009
1309
  },
1310
  "Person":{
1311
- "p":0.9592684954,
1312
- "r":0.9490131579,
1313
- "f":0.9541132699
1314
  },
1315
  "Tense":{
1316
- "p":0.9638157895,
1317
- "r":0.9712707182,
1318
- "f":0.9675288938
1319
  },
1320
  "VerbForm":{
1321
- "p":0.9621212121,
1322
- "r":0.9165998396,
1323
- "f":0.9388090349
1324
  },
1325
  "Voice":{
1326
- "p":0.9625506073,
1327
- "r":0.972392638,
1328
- "f":0.9674465921
1329
  },
1330
  "Number[psor]":{
1331
- "p":0.972181552,
1332
- "r":0.9458689459,
1333
- "f":0.9588447653
1334
  },
1335
  "Person[psor]":{
1336
- "p":0.9736456808,
1337
- "r":0.9486447932,
1338
- "f":0.960982659
1339
  },
1340
  "NumType":{
1341
- "p":0.9494949495,
1342
- "r":0.9170731707,
1343
- "f":0.9330024814
1344
  },
1345
  "Poss":{
1346
- "p":0.75,
1347
  "r":1.0,
1348
- "f":0.8571428571
1349
  },
1350
  "Reflex":{
1351
  "p":1.0,
@@ -1353,9 +1360,9 @@
1353
  "f":0.9333333333
1354
  },
1355
  "Aspect":{
1356
- "p":0.0,
1357
- "r":0.0,
1358
- "f":0.0
1359
  },
1360
  "Number[psed]":{
1361
  "p":0.0,
@@ -1363,114 +1370,114 @@
1363
  "f":0.0
1364
  }
1365
  },
1366
- "lemma_acc":0.9681370204,
1367
- "dep_uas":0.8211578415,
1368
- "dep_las":0.7476698484,
1369
  "dep_las_per_type":{
1370
  "det":{
1371
- "p":0.8633921719,
1372
- "r":0.8957006369,
1373
- "f":0.8792497069
1374
  },
1375
  "amod:att":{
1376
- "p":0.83160415,
1377
- "r":0.8520032706,
1378
- "f":0.8416801292
1379
  },
1380
  "nsubj":{
1381
- "p":0.7404458599,
1382
- "r":0.7265625,
1383
- "f":0.7334384858
1384
  },
1385
  "advmod:mode":{
1386
- "p":0.5841584158,
1387
- "r":0.5784313725,
1388
- "f":0.5812807882
1389
  },
1390
  "nmod:att":{
1391
- "p":0.767594108,
1392
- "r":0.7949152542,
1393
- "f":0.7810158201
1394
  },
1395
  "obl":{
1396
- "p":0.7793167128,
1397
- "r":0.7596759676,
1398
- "f":0.7693710119
1399
  },
1400
  "obj":{
1401
- "p":0.8401826484,
1402
- "r":0.8269662921,
1403
- "f":0.8335220838
1404
  },
1405
  "root":{
1406
- "p":0.8080357143,
1407
- "r":0.8062360802,
1408
- "f":0.8071348941
1409
  },
1410
  "cc":{
1411
- "p":0.7224669604,
1412
- "r":0.6905263158,
1413
- "f":0.7061356297
1414
  },
1415
  "conj":{
1416
- "p":0.4868686869,
1417
- "r":0.5020833333,
1418
- "f":0.4943589744
1419
  },
1420
  "advmod":{
1421
- "p":0.8,
1422
- "r":0.8421052632,
1423
- "f":0.8205128205
1424
  },
1425
  "flat:name":{
1426
- "p":0.8384279476,
1427
- "r":0.8971962617,
1428
- "f":0.8668171558
1429
  },
1430
  "appos":{
1431
- "p":0.5068493151,
1432
  "r":0.3936170213,
1433
- "f":0.4431137725
1434
  },
1435
  "advcl":{
1436
- "p":0.2702702703,
1437
- "r":0.2040816327,
1438
- "f":0.2325581395
1439
  },
1440
  "advmod:tlocy":{
1441
- "p":0.6833333333,
1442
- "r":0.7130434783,
1443
- "f":0.6978723404
1444
  },
1445
  "ccomp:obj":{
1446
- "p":0.2807017544,
1447
- "r":0.4848484848,
1448
- "f":0.3555555556
1449
  },
1450
  "mark":{
1451
- "p":0.8881578947,
1452
- "r":0.8544303797,
1453
- "f":0.8709677419
1454
  },
1455
  "compound:preverb":{
1456
- "p":0.8793103448,
1457
- "r":0.9357798165,
1458
- "f":0.9066666667
1459
  },
1460
  "advmod:locy":{
1461
- "p":0.6818181818,
1462
- "r":0.46875,
1463
- "f":0.5555555556
1464
  },
1465
  "cop":{
1466
- "p":0.6279069767,
1467
- "r":0.6585365854,
1468
- "f":0.6428571429
1469
  },
1470
  "nmod:obl":{
1471
- "p":0.1764705882,
1472
- "r":0.075,
1473
- "f":0.1052631579
1474
  },
1475
  "advmod:to":{
1476
  "p":0.0,
@@ -1483,74 +1490,69 @@
1483
  "f":0.0
1484
  },
1485
  "ccomp:obl":{
1486
- "p":0.5882352941,
1487
- "r":0.3125,
1488
- "f":0.4081632653
1489
  },
1490
  "iobj":{
1491
- "p":0.2105263158,
1492
- "r":0.2666666667,
1493
- "f":0.2352941176
1494
  },
1495
- "dep":{
1496
- "p":0.0,
1497
- "r":0.0,
1498
- "f":0.0
1499
  },
1500
  "case":{
1501
- "p":0.942408377,
1502
- "r":0.9183673469,
1503
- "f":0.9302325581
1504
  },
1505
  "csubj":{
1506
- "p":0.4,
1507
- "r":0.4324324324,
1508
- "f":0.4155844156
1509
  },
1510
  "parataxis":{
1511
- "p":0.3265306122,
1512
- "r":0.2191780822,
1513
- "f":0.262295082
1514
- },
1515
- "xcomp":{
1516
- "p":0.7945205479,
1517
- "r":0.7837837838,
1518
- "f":0.7891156463
1519
  },
1520
  "nummod":{
1521
- "p":0.5692307692,
1522
- "r":0.3978494624,
1523
- "f":0.4683544304
1524
  },
1525
  "acl":{
1526
- "p":0.3653846154,
1527
- "r":0.2638888889,
1528
- "f":0.3064516129
1529
  },
1530
- "ccomp:pred":{
1531
- "p":0.0,
1532
- "r":0.0,
1533
- "f":0.0
 
 
 
 
 
1534
  },
1535
  "advmod:tto":{
1536
- "p":0.5,
1537
- "r":0.2,
1538
- "f":0.2857142857
1539
  },
1540
  "nmod":{
1541
- "p":0.25,
1542
  "r":0.0909090909,
1543
- "f":0.1333333333
1544
- },
1545
- "orphan":{
1546
- "p":0.0,
1547
- "r":0.0,
1548
- "f":0.0
1549
  },
1550
  "aux":{
1551
- "p":1.0,
1552
- "r":0.25,
1553
- "f":0.4
1554
  },
1555
  "advmod:tfrom":{
1556
  "p":0.0,
@@ -1562,17 +1564,12 @@
1562
  "r":0.0,
1563
  "f":0.0
1564
  },
1565
- "compound":{
1566
- "p":0.95,
1567
- "r":0.95,
1568
- "f":0.95
1569
- },
1570
- "ccomp":{
1571
  "p":0.0,
1572
  "r":0.0,
1573
  "f":0.0
1574
  },
1575
- "obl:lvc":{
1576
  "p":0.0,
1577
  "r":0.0,
1578
  "f":0.0
@@ -1583,42 +1580,52 @@
1583
  "f":0.0
1584
  },
1585
  "list":{
1586
- "p":0.5,
1587
  "r":0.1666666667,
1588
- "f":0.25
 
 
 
 
 
1589
  },
1590
  "advmod:que":{
1591
- "p":1.0,
1592
  "r":0.5,
1593
- "f":0.6666666667
 
 
 
 
 
1594
  }
1595
  },
1596
- "ents_p":0.8607281725,
1597
- "ents_r":0.8561884669,
1598
- "ents_f":0.858452318,
1599
  "ents_per_type":{
1600
  "ORG":{
1601
- "p":0.8931902985,
1602
- "r":0.8878071395,
1603
- "f":0.8904905836
1604
  },
1605
  "PER":{
1606
- "p":0.8891523414,
1607
- "r":0.8960573477,
1608
- "f":0.8925914906
1609
  },
1610
  "LOC":{
1611
- "p":0.8766404199,
1612
- "r":0.8697916667,
1613
- "f":0.8732026144
1614
  },
1615
  "MISC":{
1616
- "p":0.6622807018,
1617
- "r":0.6425531915,
1618
- "f":0.6522678186
1619
  }
1620
  },
1621
- "speed":645.2528714004
1622
  },
1623
  "sources":[
1624
  {
 
1
  {
2
  "lang":"hu",
3
  "name":"core_news_lg",
4
+ "version":"3.3.1",
5
  "description":"Core Hungarian model for HuSpaCy. Components: tok2vec, senter, tagger, morphologizer, lemmatizer, parser, ner",
6
  "author":"SzegedAI, MILAB",
7
  "email":"[email protected]",
 
1188
  "Case=Dat|Number=Plur|POS=PRON|Person=1|PronType=Prs",
1189
  "Case=Acc|Number=Plur|Number[psor]=Sing|POS=PROPN|Person[psor]=3",
1190
  "Case=All|Number=Sing|Number[psed]=Sing|POS=PRON|Person=3|PronType=Tot"
1191
+ ],
1192
+ "lookup_lemmatizer":[
1193
+
1194
  ],
1195
  "parser":[
1196
  "ROOT",
 
1249
  "senter",
1250
  "tagger",
1251
  "morphologizer",
1252
+ "lookup_lemmatizer",
1253
  "lemmatizer",
1254
+ "lemma_smoother",
1255
  "parser",
1256
  "ner"
1257
  ],
 
1260
  "senter",
1261
  "tagger",
1262
  "morphologizer",
1263
+ "lookup_lemmatizer",
1264
  "lemmatizer",
1265
+ "lemma_smoother",
1266
  "parser",
1267
  "ner"
1268
  ],
 
1274
  "token_p":0.998565417,
1275
  "token_r":0.9993300153,
1276
  "token_f":0.9989475698,
1277
+ "sents_p":0.9865168539,
1278
  "sents_r":0.9777282851,
1279
+ "sents_f":0.9821029083,
1280
+ "tag_acc":0.9677018039,
1281
+ "pos_acc":0.9675104072,
1282
+ "morph_acc":0.9386544167,
1283
+ "morph_micro_p":0.9693278037,
1284
+ "morph_micro_r":0.9642458101,
1285
+ "morph_micro_f":0.9667801284,
1286
  "morph_per_feat":{
1287
  "Definite":{
1288
+ "p":0.9721835883,
1289
+ "r":0.9785347643,
1290
+ "f":0.9753488372
1291
  },
1292
  "PronType":{
1293
+ "p":0.979501385,
1294
+ "r":0.9757174393,
1295
+ "f":0.9776057506
1296
  },
1297
  "Case":{
1298
+ "p":0.9769016328,
1299
+ "r":0.9693736416,
1300
+ "f":0.9731230784
1301
  },
1302
  "Degree":{
1303
+ "p":0.9226086957,
1304
+ "r":0.8826955075,
1305
+ "f":0.9022108844
1306
  },
1307
  "Number":{
1308
+ "p":0.9862277461,
1309
+ "r":0.9840791017,
1310
+ "f":0.9851522523
1311
  },
1312
  "Mood":{
1313
+ "p":0.9276457883,
1314
+ "r":0.9523281596,
1315
+ "f":0.9398249453
1316
  },
1317
  "Person":{
1318
+ "p":0.9452054795,
1319
+ "r":0.9646381579,
1320
+ "f":0.9548229548
1321
  },
1322
  "Tense":{
1323
+ "p":0.9589632829,
1324
+ "r":0.9812154696,
1325
+ "f":0.9699617695
1326
  },
1327
  "VerbForm":{
1328
+ "p":0.9598976109,
1329
+ "r":0.9021651965,
1330
+ "f":0.93013642
1331
  },
1332
  "Voice":{
1333
+ "p":0.9570858283,
1334
+ "r":0.9805725971,
1335
+ "f":0.9686868687
1336
  },
1337
  "Number[psor]":{
1338
+ "p":0.9793205318,
1339
+ "r":0.9444444444,
1340
+ "f":0.9615663524
1341
  },
1342
  "Person[psor]":{
1343
+ "p":0.9807976366,
1344
+ "r":0.9472182596,
1345
+ "f":0.9637155298
1346
  },
1347
  "NumType":{
1348
+ "p":0.9223529412,
1349
+ "r":0.956097561,
1350
+ "f":0.9389221557
1351
  },
1352
  "Poss":{
1353
+ "p":0.5,
1354
  "r":1.0,
1355
+ "f":0.6666666667
1356
  },
1357
  "Reflex":{
1358
  "p":1.0,
 
1360
  "f":0.9333333333
1361
  },
1362
  "Aspect":{
1363
+ "p":1.0,
1364
+ "r":0.25,
1365
+ "f":0.4
1366
  },
1367
  "Number[psed]":{
1368
  "p":0.0,
 
1370
  "f":0.0
1371
  }
1372
  },
1373
+ "lemma_acc":0.9716773514,
1374
+ "dep_uas":0.8069939475,
1375
+ "dep_las":0.736004483,
1376
  "dep_las_per_type":{
1377
  "det":{
1378
+ "p":0.8576923077,
1379
+ "r":0.8877388535,
1380
+ "f":0.872456964
1381
  },
1382
  "amod:att":{
1383
+ "p":0.8787878788,
1384
+ "r":0.8062142273,
1385
+ "f":0.8409381663
1386
  },
1387
  "nsubj":{
1388
+ "p":0.7076923077,
1389
+ "r":0.71875,
1390
+ "f":0.7131782946
1391
  },
1392
  "advmod:mode":{
1393
+ "p":0.5563380282,
1394
+ "r":0.5808823529,
1395
+ "f":0.5683453237
1396
  },
1397
  "nmod:att":{
1398
+ "p":0.7508143322,
1399
+ "r":0.7813559322,
1400
+ "f":0.7657807309
1401
  },
1402
  "obl":{
1403
+ "p":0.7599645704,
1404
+ "r":0.7722772277,
1405
+ "f":0.7660714286
1406
  },
1407
  "obj":{
1408
+ "p":0.8466819222,
1409
+ "r":0.8314606742,
1410
+ "f":0.8390022676
1411
  },
1412
  "root":{
1413
+ "p":0.7730337079,
1414
+ "r":0.7661469933,
1415
+ "f":0.7695749441
1416
  },
1417
  "cc":{
1418
+ "p":0.6673913043,
1419
+ "r":0.6463157895,
1420
+ "f":0.656684492
1421
  },
1422
  "conj":{
1423
+ "p":0.50456621,
1424
+ "r":0.4604166667,
1425
+ "f":0.4814814815
1426
  },
1427
  "advmod":{
1428
+ "p":0.8137254902,
1429
+ "r":0.8736842105,
1430
+ "f":0.8426395939
1431
  },
1432
  "flat:name":{
1433
+ "p":0.8085106383,
1434
+ "r":0.8878504673,
1435
+ "f":0.846325167
1436
  },
1437
  "appos":{
1438
+ "p":0.4352941176,
1439
  "r":0.3936170213,
1440
+ "f":0.4134078212
1441
  },
1442
  "advcl":{
1443
+ "p":0.3552631579,
1444
+ "r":0.2755102041,
1445
+ "f":0.3103448276
1446
  },
1447
  "advmod:tlocy":{
1448
+ "p":0.7136752137,
1449
+ "r":0.7260869565,
1450
+ "f":0.7198275862
1451
  },
1452
  "ccomp:obj":{
1453
+ "p":0.1746031746,
1454
+ "r":0.3333333333,
1455
+ "f":0.2291666667
1456
  },
1457
  "mark":{
1458
+ "p":0.8104575163,
1459
+ "r":0.7848101266,
1460
+ "f":0.7974276527
1461
  },
1462
  "compound:preverb":{
1463
+ "p":0.8928571429,
1464
+ "r":0.9174311927,
1465
+ "f":0.9049773756
1466
  },
1467
  "advmod:locy":{
1468
+ "p":0.76,
1469
+ "r":0.59375,
1470
+ "f":0.6666666667
1471
  },
1472
  "cop":{
1473
+ "p":0.9333333333,
1474
+ "r":0.3414634146,
1475
+ "f":0.5
1476
  },
1477
  "nmod:obl":{
1478
+ "p":0.3333333333,
1479
+ "r":0.225,
1480
+ "f":0.2686567164
1481
  },
1482
  "advmod:to":{
1483
  "p":0.0,
 
1490
  "f":0.0
1491
  },
1492
  "ccomp:obl":{
1493
+ "p":0.3488372093,
1494
+ "r":0.46875,
1495
+ "f":0.4
1496
  },
1497
  "iobj":{
1498
+ "p":0.25,
1499
+ "r":0.1333333333,
1500
+ "f":0.1739130435
1501
  },
1502
+ "xcomp":{
1503
+ "p":0.9428571429,
1504
+ "r":0.8918918919,
1505
+ "f":0.9166666667
1506
  },
1507
  "case":{
1508
+ "p":0.9513513514,
1509
+ "r":0.8979591837,
1510
+ "f":0.9238845144
1511
  },
1512
  "csubj":{
1513
+ "p":0.3666666667,
1514
+ "r":0.2972972973,
1515
+ "f":0.328358209
1516
  },
1517
  "parataxis":{
1518
+ "p":0.2790697674,
1519
+ "r":0.1643835616,
1520
+ "f":0.2068965517
 
 
 
 
 
1521
  },
1522
  "nummod":{
1523
+ "p":0.5384615385,
1524
+ "r":0.6774193548,
1525
+ "f":0.6
1526
  },
1527
  "acl":{
1528
+ "p":0.2823529412,
1529
+ "r":0.3333333333,
1530
+ "f":0.3057324841
1531
  },
1532
+ "ccomp":{
1533
+ "p":0.3333333333,
1534
+ "r":0.0769230769,
1535
+ "f":0.125
1536
+ },
1537
+ "compound":{
1538
+ "p":0.65,
1539
+ "r":0.975,
1540
+ "f":0.78
1541
  },
1542
  "advmod:tto":{
1543
+ "p":0.3157894737,
1544
+ "r":0.6,
1545
+ "f":0.4137931034
1546
  },
1547
  "nmod":{
1548
+ "p":0.0526315789,
1549
  "r":0.0909090909,
1550
+ "f":0.0666666667
 
 
 
 
 
1551
  },
1552
  "aux":{
1553
+ "p":0.8888888889,
1554
+ "r":0.6666666667,
1555
+ "f":0.7619047619
1556
  },
1557
  "advmod:tfrom":{
1558
  "p":0.0,
 
1564
  "r":0.0,
1565
  "f":0.0
1566
  },
1567
+ "obl:lvc":{
 
 
 
 
 
1568
  "p":0.0,
1569
  "r":0.0,
1570
  "f":0.0
1571
  },
1572
+ "orphan":{
1573
  "p":0.0,
1574
  "r":0.0,
1575
  "f":0.0
 
1580
  "f":0.0
1581
  },
1582
  "list":{
1583
+ "p":1.0,
1584
  "r":0.1666666667,
1585
+ "f":0.2857142857
1586
+ },
1587
+ "dep":{
1588
+ "p":0.0,
1589
+ "r":0.0,
1590
+ "f":0.0
1591
  },
1592
  "advmod:que":{
1593
+ "p":0.6666666667,
1594
  "r":0.5,
1595
+ "f":0.5714285714
1596
+ },
1597
+ "ccomp:pred":{
1598
+ "p":0.0,
1599
+ "r":0.0,
1600
+ "f":0.0
1601
  }
1602
  },
1603
+ "ents_p":0.847826087,
1604
+ "ents_r":0.8570675105,
1605
+ "ents_f":0.8524217521,
1606
  "ents_per_type":{
1607
  "ORG":{
1608
+ "p":0.8934198332,
1609
+ "r":0.8938340287,
1610
+ "f":0.893626883
1611
  },
1612
  "PER":{
1613
+ "p":0.8527607362,
1614
+ "r":0.9133811231,
1615
+ "f":0.882030574
1616
  },
1617
  "LOC":{
1618
+ "p":0.8836363636,
1619
+ "r":0.84375,
1620
+ "f":0.8632326821
1621
  },
1622
  "MISC":{
1623
+ "p":0.6380543634,
1624
+ "r":0.6326241135,
1625
+ "f":0.6353276353
1626
  }
1627
  },
1628
+ "speed":555.2070840132
1629
  },
1630
  "sources":[
1631
  {
morphologizer/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:85c95d1d6326af2603f7df62c9f4fe9f3b7d58c6c8f9f1dba963807ab51cab50
3
  size 1383846
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f5ca20500cbdd421aef3d9684d3a5783decf95365c78d4d57a5f4dd13e4d577a
3
  size 1383846
ner/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:426b47725c1cca40415a0574a691c2f969fc67a096b7ec153d889c496c1f72de
3
  size 56989063
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:eedc3bce0b2163ddc254979ebc0638fd1fafc17757e4174a61a52a220ca67605
3
  size 56989063
parser/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a919f2697e61b8b75cd66899d402bcee6a4020770957e41756ff8d8840ad77ad
3
  size 26010735
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4b6ef22e01aa41f9994276b00d702e8209874838e0b7cb75126a48bf590c62a6
3
  size 26010735
senter/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:cfb142ec1952a5cf8602016a742ec6696a72c4c5944f2f7ad1de1f2fc2a93616
3
  size 2845
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0dc6cfcf9c72e9b95f410351ddb9e3af75ac92379d5bf952bb125cba3d746a4a
3
  size 2845
tagger/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:82668df330aa2a60e68bfee796a3ce6b33943ebe8a0445421dddb0f8ebf86962
3
  size 20905
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b6c130a4c5ad4a2974b06f0c984b4205cbe68ad24a3869ee9fedde33b1277b75
3
  size 20905
tok2vec/model CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:91d2a4cd8472b6771b2d8ace65a4a17ad73fd8f5a04530687b6fbdaf26f8a821
3
  size 56806299
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2249eb08d5f2dde70f0267a57d3dce7dbeffd51a5dbbdab6b2ed77680d24cac8
3
  size 56806299
vocab/strings.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f753e03c09fb53138101f44d1a7f3657855c116aeba6a26f5aca39bd43d67dc7
3
- size 6403937
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6ac5abb49f76f2c3e494f52266f182f777ec5e097cba4125a7107b6c3500792f
3
+ size 6404104