cpalenmichel commited on
Commit
a1e8328
·
verified ·
1 Parent(s): 0c2020b

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +123 -0
README.md ADDED
@@ -0,0 +1,123 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ # For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
3
+ # Doc / guide: https://huggingface.co/docs/hub/model-cards
4
+ {}
5
+ ---
6
+
7
+ # Model Card for Model ID
8
+
9
+ E-commerce query segmentation model in English.
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ This is a token classification model using BERT base uncased as the base model.
17
+ The model is fine-tuned on the (QueryNER training dataset)[https://huggingface.co/datasets/bltlab/queryner].
18
+
19
+
20
+ - **Developed by:** (BLT Lab)[https://github.com/bltlab] in collaboration with eBay.
21
+ - **Funded by:** eBay
22
+ - **Shared by:** (@cpalenmichel)[https://github.com/cpalenmichel]
23
+ - **Model type:** Token Classification / Sequence Labeling / Chunking
24
+ - **Language(s) (NLP):** English
25
+ - **License:** CC-BY 4.0
26
+ - **Finetuned from model:** BERT base uncased
27
+
28
+ ### Model Sources
29
+
30
+ Underlying model is based on (BERT base-uncased)[https://huggingface.co/google-bert/bert-base-uncased].
31
+
32
+ - **Repository:** (https://github.com/bltlab/query-ner)[https://github.com/bltlab/query-ner]
33
+ - **Paper:** Accepted at LREC-COLING Coming soon
34
+
35
+ ## Uses
36
+
37
+ ### Direct Use
38
+
39
+ Intended use is research purposes and e-commerce query segmentation.
40
+
41
+ ### Downstream Use
42
+
43
+ Potential downstream use cases include weighting entity spans, linking to knowledge bases, removing spans as a recovery strategy for null and low recall queries.
44
+
45
+ ### Out-of-Scope Use
46
+
47
+ This model is trained only on the training data of the QueryNER dataset. It may not perform well on other domains without additional training data and further fine-tuning.
48
+
49
+ ## Bias, Risks, and Limitations
50
+
51
+ See paper limitations section.
52
+
53
+ ## How to Get Started with the Model
54
+
55
+ See huggingface tutorials for token classification and access the model using AutoModelForTokenClassification.
56
+ Note that we do some post processing to make use of only the first subtoken's tag unlike the inference API.
57
+
58
+ ## Training Details
59
+
60
+ ### Training Data
61
+
62
+ See paper for details.
63
+
64
+
65
+ ### Training Procedure
66
+
67
+ See paper for details.
68
+
69
+ #### Training Hyperparameters
70
+
71
+ See paper for details.
72
+
73
+
74
+ ## Evaluation
75
+
76
+ Evaluation details provided in the paper.
77
+ Scoring was done using (SeqScore)[https://github.com/bltlab/seqscore] using the conlleval repair method for invalid label transition sequences.
78
+
79
+ ### Testing Data, Factors & Metrics
80
+
81
+ #### Testing Data
82
+
83
+ QueryNER test set: (https://huggingface.co/datasets/bltlab/queryner)[https://huggingface.co/datasets/bltlab/queryner]
84
+
85
+
86
+ #### Factors
87
+ Evaluation is reported with micro-F1 at the entity level on the QueryNER test set.
88
+ We used conlleval repair method for invalid label transitions.
89
+
90
+ #### Metrics
91
+ We use micro-F1 at the entity level as this is fairly common practice for NER models.
92
+
93
+ ### Results
94
+
95
+ [More Information Needed]
96
+
97
+
98
+ ## Environmental Impact
99
+ Rough estimate
100
+
101
+ - **Hardware Type:** 1 RTX 3090 GPU
102
+ - **Hours used:** < 2 hours
103
+ - **Cloud Provider:** Private
104
+ - **Compute Region:** northamerica-northeast1
105
+ - **Carbon Emitted:** 0.02
106
+
107
+
108
+ ## Citation
109
+
110
+ Accepted at LREC-COLING coming soon
111
+
112
+ **BibTeX:**
113
+
114
+ Accepted at LREC-COLING coming soon
115
+
116
+
117
+ ## Model Card Authors
118
+
119
+ Chester Palen-Michel (@cpalenmichel)[https://github.com/cpalenmichel]
120
+
121
+ ## Model Card Contact
122
+
123
+ Chester Palen-Michel (@cpalenmichel)[https://github.com/cpalenmichel]