Token Classification
spaCy
Tagalog
ljvmiranda921 commited on
Commit
a2ae35c
·
verified ·
1 Parent(s): 0f7d401

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -2
README.md CHANGED
@@ -6,7 +6,15 @@ language:
6
  - tl
7
  license: mit
8
  ---
9
- calamanCy: Tagalog NLP pipelines in spaCy
 
 
 
 
 
 
 
 
10
 
11
  | Feature | Description |
12
  | --- | --- |
@@ -33,4 +41,23 @@ calamanCy: Tagalog NLP pipelines in spaCy
33
  | **`parser`** | `ROOT`, `acl`, `acl:relcl`, `advcl`, `advmod`, `amod`, `appos`, `case`, `cc`, `ccomp`, `compound`, `compound:redup`, `conj`, `dep`, `det`, `discourse`, `dislocated`, `fixed`, `flat`, `goeswith`, `list`, `mark`, `nmod`, `nmod:poss`, `nsubj`, `nummod`, `obj`, `obj:agent`, `obl`, `orphan`, `parataxis`, `punct`, `vocative`, `xcomp` |
34
  | **`ner`** | `LOC`, `ORG`, `PER` |
35
 
36
- </details>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  - tl
7
  license: mit
8
  ---
9
+
10
+ <img src="https://raw.githubusercontent.com/ljvmiranda921/calamanCy/refs/heads/master/logo.png" width="130" height="130" align="right" />
11
+
12
+ # calamanCy: Tagalog NLP pipelines in spaCy
13
+
14
+ This is the latest **large-sized pipeline** for [calamanCy](https://arxiv.org/abs/2311.07171).
15
+ Compared to the 0.1.0 version, this pipeline is trained on a larger treebank ([UD-NewsCrawl](https://huggingface.co/datasets/UD-Filipino/UD_Tagalog-NewsCrawl)), with large improvements in dependency parsing, morphological annotation, and POS tagging.
16
+ This pipeline also implements a neural edit-tree lemmatizer, allowing better lemmatization than the previous model.
17
+ The training code can be found [in GitHub](https://github.com/ljvmiranda921/calamanCy/tree/master/models/v0.1.0).
18
 
19
  | Feature | Description |
20
  | --- | --- |
 
41
  | **`parser`** | `ROOT`, `acl`, `acl:relcl`, `advcl`, `advmod`, `amod`, `appos`, `case`, `cc`, `ccomp`, `compound`, `compound:redup`, `conj`, `dep`, `det`, `discourse`, `dislocated`, `fixed`, `flat`, `goeswith`, `list`, `mark`, `nmod`, `nmod:poss`, `nsubj`, `nummod`, `obj`, `obj:agent`, `obl`, `orphan`, `parataxis`, `punct`, `vocative`, `xcomp` |
42
  | **`ner`** | `LOC`, `ORG`, `PER` |
43
 
44
+ </details>
45
+
46
+ ### Citation
47
+
48
+ If you're using this model, please cite:
49
+
50
+ ```
51
+ @inproceedings{miranda-2023-calamancy,
52
+ title = "calaman{C}y: A {T}agalog Natural Language Processing Toolkit",
53
+ author = "Miranda, Lester James",
54
+ booktitle = "Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)",
55
+ month = dec,
56
+ year = "2023",
57
+ address = "Singapore",
58
+ publisher = "Association for Computational Linguistics",
59
+ url = "https://aclanthology.org/2023.nlposs-1.1/",
60
+ doi = "10.18653/v1/2023.nlposs-1.1",
61
+ pages = "1--7",
62
+ }
63
+ ```