stcoats commited on
Commit
3b2aafd
·
1 Parent(s): 94b8006

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +44 -3
README.md CHANGED
@@ -1,3 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  tags:
3
  - spacy
@@ -23,9 +64,9 @@ model-index:
23
  | **Default Pipeline** | `tok2vec`, `tagger` |
24
  | **Components** | `tok2vec`, `tagger` |
25
  | **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
26
- | **Sources** | n/a |
27
- | **License** | n/a |
28
- | **Author** | [n/a]() |
29
 
30
  ### Label Scheme
31
 
 
1
+ ---
2
+ tags:
3
+ - spacy
4
+ - token-classification
5
+ language:
6
+ - de
7
+ model-index:
8
+ - name: de_pipeline
9
+ results:
10
+ - task:
11
+ name: TAG
12
+ type: token-classification
13
+ metrics:
14
+ - name: TAG (XPOS) Accuracy
15
+ type: accuracy
16
+ value: 0.9191333537
17
+ license: cc-by-4.0
18
+ library_name: spacy
19
+ ---
20
+ ## de_STTS2_folk tagger
21
+
22
+ This is a spaCy language model trained to use the Stuttgart-Tübingen Tagset version 2.0, which was designed to tag transcripts of conversational speech in German.
23
+ The model may be useful for tagging ASR transcripts such as those collected in the [CoGS](https://cc.oulu.fi/~scoats/CoGS.html) corpus.
24
+
25
+ The model was trained using the tag annotations from the FOLK corpus at https://agd.ids-mannheim.de/folk-gold.shtml. Tokens in the training data for the model were converted to lower case prior to traning to match the format used for automatic speech recognition transcripts on YouTube, as of early 2023.
26
+
27
+ Usage example:
28
+ ```python
29
+ !pip install https://huggingface.co/stcoats/de_pipeline/resolve/main/de_pipeline-any-py3-none-any.whl
30
+ import de_pipeline
31
+ nlp = de_pipeline.load()
32
+ doc = nlp("ach so meinst du wir sollen es jetzt tun")
33
+ for token in doc:
34
+ print(token.text, token.tag_)
35
+ ```
36
+ ### References
37
+
38
+ Coats, Steven. (In review).
39
+
40
+ Westpfahl, Swantje and Thomas Schmidt. (2016): [FOLK-Gold – A GOLD standard for Part-of-Speech-Tagging of Spoken German](https://aclanthology.org/L16-1237). In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), Portorož, Slovenia. Paris: European Language Resources Association (ELRA), pp. 1493-1499.
41
+
42
  ---
43
  tags:
44
  - spacy
 
64
  | **Default Pipeline** | `tok2vec`, `tagger` |
65
  | **Components** | `tok2vec`, `tagger` |
66
  | **Vectors** | 0 keys, 0 unique vectors (0 dimensions) |
67
+ | **Sources** | Swantje Westpfahl and Thomas Schmidt, FOLK-Gold, https://agd.ids-mannheim.de/folk-gold.shtml |
68
+ | **License** | CC-BY 4.0 |
69
+ | **Author** | Steven Coats |
70
 
71
  ### Label Scheme
72