Update README.md
Browse files
README.md
CHANGED
@@ -16,8 +16,42 @@ This is the fine-tuned model for the joint parsing of the following tasks:
|
|
16 |
- Syntactical Parsing (Dependency-Tree)
|
17 |
- Named-Entity Recognition
|
18 |
|
|
|
|
|
|
|
|
|
19 |
For the bert-base models for other tasks, see [here](https://huggingface.co/collections/dicta-il/dictabert-6588e7cc08f83845fc42a18b).
|
20 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
21 |
Sample usage:
|
22 |
|
23 |
```python
|
@@ -29,7 +63,7 @@ model = AutoModel.from_pretrained('dicta-il/dictabert-joint', trust_remote_code=
|
|
29 |
model.eval()
|
30 |
|
31 |
sentence = 'ืืฉื ืช 1948 ืืฉืืื ืืคืจืื ืงืืฉืื ืืช ืืืืืืื ืืคืืกืื ืืชืืช ืืืชืืืืืช ืืืื ืืช ืืืื ืืคืจืกื ืืืืจืื ืืืืืจืืกืืืื'
|
32 |
-
print(model.predict([sentence], tokenizer))
|
33 |
```
|
34 |
|
35 |
Output:
|
@@ -409,6 +443,52 @@ Output:
|
|
409 |
]
|
410 |
```
|
411 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
412 |
|
413 |
## Citation
|
414 |
|
|
|
16 |
- Syntactical Parsing (Dependency-Tree)
|
17 |
- Named-Entity Recognition
|
18 |
|
19 |
+
A live demo of the model can be found [here](https://huggingface.co/spaces/dicta-il/joint-demo).
|
20 |
+
|
21 |
+
For a faster model, you can use the equivalent bert-tiny model for this task [here](https://huggingface.co/dicta-il/dictabert-tiny-joint).
|
22 |
+
|
23 |
For the bert-base models for other tasks, see [here](https://huggingface.co/collections/dicta-il/dictabert-6588e7cc08f83845fc42a18b).
|
24 |
|
25 |
+
---
|
26 |
+
|
27 |
+
The model currently supports 3 types of output:
|
28 |
+
|
29 |
+
1. **JSON**: The model returns a JSON object for each sentence in the input, where for each sentence we have the sentence text, the NER entities, and the list of tokens. For each token we include the output from each of the tasks.
|
30 |
+
```python
|
31 |
+
model.predict(..., output_style='json')
|
32 |
+
```
|
33 |
+
|
34 |
+
1. **UD**: The model returns the full UD output for each sentence, according to the style of the Hebrew UD Treebank.
|
35 |
+
```python
|
36 |
+
model.predict(..., output_style='ud')
|
37 |
+
```
|
38 |
+
|
39 |
+
1. **UD, in the style of IAHLT**: This model returns the full UD output, with slight modifications to match the style of IAHLT. This differences are mostly granularity of some dependency relations, how the suffix of a word is broken up, and implicit definite articles. The actual tagging behavior doesn't change.
|
40 |
+
```python
|
41 |
+
model.predict(..., output_style='iahlt_ud')
|
42 |
+
```
|
43 |
+
|
44 |
+
---
|
45 |
+
|
46 |
+
If you only need the output for one of the tasks, you can tell the model to not initialize some of the heads, for example:
|
47 |
+
```python
|
48 |
+
model = AutoModel.from_pretrained('dicta-il/dictabert-joint', trust_remote_code=True, do_lex=False)
|
49 |
+
```
|
50 |
+
|
51 |
+
The list of options are: `do_lex`, `do_syntax`, `do_ner`, `do_prefix`, `do_morph`.
|
52 |
+
|
53 |
+
---
|
54 |
+
|
55 |
Sample usage:
|
56 |
|
57 |
```python
|
|
|
63 |
model.eval()
|
64 |
|
65 |
sentence = 'ืืฉื ืช 1948 ืืฉืืื ืืคืจืื ืงืืฉืื ืืช ืืืืืืื ืืคืืกืื ืืชืืช ืืืชืืืืืช ืืืื ืืช ืืืื ืืคืจืกื ืืืืจืื ืืืืืจืืกืืืื'
|
66 |
+
print(model.predict([sentence], tokenizer, output_style='json')) # see below for other return formats
|
67 |
```
|
68 |
|
69 |
Output:
|
|
|
443 |
]
|
444 |
```
|
445 |
|
446 |
+
You can also choose to get your response in UD format:
|
447 |
+
|
448 |
+
```python
|
449 |
+
sentence = 'ืืฉื ืช 1948 ืืฉืืื ืืคืจืื ืงืืฉืื ืืช ืืืืืืื ืืคืืกืื ืืชืืช ืืืชืืืืืช ืืืื ืืช ืืืื ืืคืจืกื ืืืืจืื ืืืืืจืืกืืืื'
|
450 |
+
print(model.predict([sentence], tokenizer, output_style='ud'))
|
451 |
+
```
|
452 |
+
|
453 |
+
Results:
|
454 |
+
```json
|
455 |
+
[
|
456 |
+
[
|
457 |
+
"# sent_id = 1",
|
458 |
+
"# text = ืืฉื ืช 1948 ืืฉืืื ืืคืจืื ืงืืฉืื ืืช ืืืืืืื ืืคืืกืื ืืชืืช ืืืชืืืืืช ืืืื ืืช ืืืื ืืคืจืกื ืืืืจืื ืืืืืจืืกืืืื",
|
459 |
+
"1-2\tืืฉื ืช\t_\t_\t_\t_\t_\t_\t_\t_",
|
460 |
+
"1\tื\tื\tADP\tADP\t_\t2\tcase\t_\t_",
|
461 |
+
"2\tืฉื ืช\tืฉื ื\tNOUN\tNOUN\tGender=Fem|Number=Sing\t4\tobl\t_\t_",
|
462 |
+
"3\t1948\t1948\tNUM\tNUM\t\t2\tcompound:smixut\t_\t_",
|
463 |
+
"4\tืืฉืืื\tืืฉืืื\tVERB\tVERB\tGender=Masc|Number=Sing|Person=3|Tense=Past\t0\troot\t_\t_",
|
464 |
+
"5\tืืคืจืื\tืืคืจืื\tPROPN\tPROPN\t\t4\tnsubj\t_\t_",
|
465 |
+
"6\tืงืืฉืื\tืงืืฉืื\tPROPN\tPROPN\t\t5\tflat\t_\t_",
|
466 |
+
"7\tืืช\tืืช\tADP\tADP\t\t8\tcase:acc\t_\t_",
|
467 |
+
"8-10\tืืืืืืื\t_\t_\t_\t_\t_\t_\t_\t_",
|
468 |
+
"8\tืืืืื_\tืืืืื\tNOUN\tNOUN\tGender=Masc|Number=Plur\t4\tobj\t_\t_",
|
469 |
+
"9\t_ืฉื_\tืฉื\tADP\tADP\t_\t10\tcase\t_\t_",
|
470 |
+
"10\t_ืืื\tืืื\tPRON\tPRON\tGender=Masc|Number=Sing|Person=3\t8\tnmod:poss\t_\t_",
|
471 |
+
"11-12\tืืคืืกืื\t_\t_\t_\t_\t_\t_\t_\t_",
|
472 |
+
"11\tื\tื\tADP\tADP\t_\t12\tcase\t_\t_",
|
473 |
+
"12\tืคืืกืื\tืคืืกืื\tNOUN\tNOUN\tGender=Masc|Number=Sing\t8\tnmod\t_\t_",
|
474 |
+
"13\tืืชืืช\tืืชืืช\tNOUN\tNOUN\tGender=Fem|Number=Sing\t12\tcompound:smixut\t_\t_",
|
475 |
+
"14-16\tืืืชืืืืืช\t_\t_\t_\t_\t_\t_\t_\t_",
|
476 |
+
"14\tื\tื\tCCONJ\tCCONJ\t_\t16\tcc\t_\t_",
|
477 |
+
"15\tื\tื\tADP\tADP\t_\t16\tcase\t_\t_",
|
478 |
+
"16\tืชืืืืืช\tืชืืืื\tNOUN\tNOUN\tGender=Fem|Number=Plur\t12\tconj\t_\t_",
|
479 |
+
"17-18\tืืืื ืืช\t_\t_\t_\t_\t_\t_\t_\t_",
|
480 |
+
"17\tื\tื\tDET\tDET\t_\t18\tdet\t_\t_",
|
481 |
+
"18\tืืื ืืช\tืืืื ืืช\tNOUN\tNOUN\tGender=Fem|Number=Sing\t16\tcompound:smixut\t_\t_",
|
482 |
+
"19-20\tืืืื\t_\t_\t_\t_\t_\t_\t_\t_",
|
483 |
+
"19\tื\tื\tCCONJ\tCCONJ\t_\t20\tcc\t_\t_",
|
484 |
+
"20\tืืื\tืืื\tVERB\tVERB\tGender=Masc|Number=Sing|Person=3|Tense=Past\t4\tconj\t_\t_",
|
485 |
+
"21\tืืคืจืกื\tืคืจืกื\tVERB\tVERB\t\t20\txcomp\t_\t_",
|
486 |
+
"22\tืืืืจืื\tืืืืจ\tNOUN\tNOUN\tGender=Masc|Number=Plur\t21\tobj\t_\t_",
|
487 |
+
"23\tืืืืืจืืกืืืื\tืืืืืจืืกืื\tADJ\tADJ\tGender=Masc|Number=Plur\t22\tamod\t_\t_"
|
488 |
+
]
|
489 |
+
]
|
490 |
+
```
|
491 |
+
|
492 |
|
493 |
## Citation
|
494 |
|