Update README from local repo
Browse files
README.md
CHANGED
@@ -39,7 +39,6 @@ Parent Model: For more details about SimCSE, we encourage users to check out the
|
|
39 |
### Example of classification of a set of responses:
|
40 |
|
41 |
```python
|
42 |
-
|
43 |
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
44 |
import torch
|
45 |
import pandas as pd
|
@@ -79,6 +78,35 @@ print(df)
|
|
79 |
|Necessity| 0.001| 0.001| 0.002| 0.980| 0.016|
|
80 |
|My job went remote and I needed to take care of my kids| 0.000| 0.000| 0.000| 0.000| 1.000|
|
81 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
82 |
## Uses
|
83 |
|
84 |
### Direct Uses
|
@@ -95,7 +123,7 @@ This model is intended to be used on survey responses for data cleaning to help
|
|
95 |
|
96 |
## Misuses and Out-of-scope Use
|
97 |
|
98 |
-
The model has been trained to specifically identify survey non-response in open ended responses where the respondent taking the survey has given a response but their answer does not respond to the question at hand or providing any meaningful insight. Some examples of these types of responses are `meow`, `ksdhfkshgk`, or `idk`. The model was
|
99 |
|
100 |
The responses the model is trained on are also from both web and phone based open-ended probes. There may be limitations in model effectiveness with more traditional open ended survey questions with responses provided in other mediums.
|
101 |
|
@@ -114,7 +142,7 @@ Some examples of refusal responses also can appear to be valid as they did not o
|
|
114 |
|
115 |
#### Training Data
|
116 |
|
117 |
-
The model was
|
118 |
|
119 |
#### Training procedure
|
120 |
|
|
|
39 |
### Example of classification of a set of responses:
|
40 |
|
41 |
```python
|
|
|
42 |
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
43 |
import torch
|
44 |
import pandas as pd
|
|
|
78 |
|Necessity| 0.001| 0.001| 0.002| 0.980| 0.016|
|
79 |
|My job went remote and I needed to take care of my kids| 0.000| 0.000| 0.000| 0.000| 1.000|
|
80 |
|
81 |
+
|
82 |
+
Alternatively, you can load the model using a pipeline
|
83 |
+
|
84 |
+
```python
|
85 |
+
from transformers import pipeline
|
86 |
+
pipe = pipeline("text-classification", "NCHS/SANDS")
|
87 |
+
print( pipe(responses) )
|
88 |
+
```
|
89 |
+
|
90 |
+
```python
|
91 |
+
[{'label': 'Gibberish', 'score': 0.9978908896446228},
|
92 |
+
{'label': 'Uncertainty', 'score': 0.9950007796287537},
|
93 |
+
{'label': 'Refusal', 'score': 0.9775006771087646},
|
94 |
+
{'label': 'High-risk', 'score': 0.9804121255874634},
|
95 |
+
{'label': 'Valid', 'score': 0.9997561573982239}]
|
96 |
+
```
|
97 |
+
|
98 |
+
With the pipeline set `top_k` to see all the full output:
|
99 |
+
|
100 |
+
```python
|
101 |
+
pipe(responses, top_k=5)
|
102 |
+
```
|
103 |
+
|
104 |
+
Finally, if you'd like to use a local GPU set the device to the GPU number (usually 0).
|
105 |
+
|
106 |
+
```python
|
107 |
+
pipe = pipeline("text-classification", "NCHS/SANDS", device=0)
|
108 |
+
```
|
109 |
+
|
110 |
## Uses
|
111 |
|
112 |
### Direct Uses
|
|
|
123 |
|
124 |
## Misuses and Out-of-scope Use
|
125 |
|
126 |
+
The model has been trained to specifically identify survey non-response in open ended responses where the respondent taking the survey has given a response but their answer does not respond to the question at hand or providing any meaningful insight. Some examples of these types of responses are `meow`, `ksdhfkshgk`, or `idk`. The model was fine-tuned on 3,000 labeled open-ended responses to web probes on questions relating to the COVID-19 pandemic gathered from the [Research and Development Survey or RANDS](https://www.cdc.gov/nchs/rands/index.htm) conducted by the Division of Research and Methodology at the National Center for Health Statistics. Web probes are questions implementing probing techniques from cognitive interviewing for use in survey question design and are different than traditional open-ended survey questions. The context of our labeled responses limited in focus on both COVID and health responses, so responses outside this scope may notice a drop in performance.
|
127 |
|
128 |
The responses the model is trained on are also from both web and phone based open-ended probes. There may be limitations in model effectiveness with more traditional open ended survey questions with responses provided in other mediums.
|
129 |
|
|
|
142 |
|
143 |
#### Training Data
|
144 |
|
145 |
+
The model was fine-tuned on 3,000 labeled open-ended responses from [RANDS during COVID 19 Rounds 1 and 2](https://www.cdc.gov/nchs/rands/index.htm). The base SimCSE BERT model was trained on BookCorpus and English Wikipedia.
|
146 |
|
147 |
#### Training procedure
|
148 |
|