Token Classification
spaCy
Tagalog
ljvmiranda921 commited on
Commit
16d6080
·
verified ·
1 Parent(s): 23a4a7d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -2
README.md CHANGED
@@ -6,7 +6,14 @@ language:
6
  - tl
7
  license: mit
8
  ---
9
- calamanCy: Tagalog NLP pipelines in spaCy
 
 
 
 
 
 
 
10
 
11
  | Feature | Description |
12
  | --- | --- |
@@ -33,4 +40,23 @@ calamanCy: Tagalog NLP pipelines in spaCy
33
  | **`parser`** | `ROOT`, `acl`, `acl:relcl`, `advcl`, `advmod`, `amod`, `appos`, `case`, `cc`, `ccomp`, `compound`, `compound:redup`, `conj`, `dep`, `det`, `discourse`, `dislocated`, `fixed`, `flat`, `goeswith`, `list`, `mark`, `nmod`, `nmod:poss`, `nsubj`, `nummod`, `obj`, `obj:agent`, `obl`, `orphan`, `parataxis`, `punct`, `vocative`, `xcomp` |
34
  | **`ner`** | `LOC`, `ORG`, `PER` |
35
 
36
- </details>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  - tl
7
  license: mit
8
  ---
9
+
10
+ # calamanCy: Tagalog NLP pipelines in spaCy
11
+
12
+ This is the latest **medium-sized pipeline** for calamanCy.
13
+ Compared to the 0.1.0 version, this pipeline is trained on a larger treebank ([UD-NewsCrawl](https://huggingface.co/datasets/UD-Filipino/UD_Tagalog-NewsCrawl)), with large improvements in dependency parsing, morphological annotation, and POS tagging.
14
+ This pipeline also implements a neural edit-tree lemmatizer, allowing better lemmatization than the previous model.
15
+ The training code can be found [in GitHub](https://github.com/ljvmiranda921/calamanCy/tree/master/models/v0.1.0).
16
+
17
 
18
  | Feature | Description |
19
  | --- | --- |
 
40
  | **`parser`** | `ROOT`, `acl`, `acl:relcl`, `advcl`, `advmod`, `amod`, `appos`, `case`, `cc`, `ccomp`, `compound`, `compound:redup`, `conj`, `dep`, `det`, `discourse`, `dislocated`, `fixed`, `flat`, `goeswith`, `list`, `mark`, `nmod`, `nmod:poss`, `nsubj`, `nummod`, `obj`, `obj:agent`, `obl`, `orphan`, `parataxis`, `punct`, `vocative`, `xcomp` |
41
  | **`ner`** | `LOC`, `ORG`, `PER` |
42
 
43
+ </details>
44
+
45
+ ### Citation
46
+
47
+ If you're using this model, please cite:
48
+
49
+ ```
50
+ @inproceedings{miranda-2023-calamancy,
51
+ title = "calaman{C}y: A {T}agalog Natural Language Processing Toolkit",
52
+ author = "Miranda, Lester James",
53
+ booktitle = "Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)",
54
+ month = dec,
55
+ year = "2023",
56
+ address = "Singapore",
57
+ publisher = "Association for Computational Linguistics",
58
+ url = "https://aclanthology.org/2023.nlposs-1.1/",
59
+ doi = "10.18653/v1/2023.nlposs-1.1",
60
+ pages = "1--7",
61
+ }
62
+ ```