NCHS
/

thoppe commited on
Commit
70e1e0e
·
1 Parent(s): 2f9ab22

Update README from local repo

Browse files
Files changed (1) hide show
  1. README.md +31 -3
README.md CHANGED
@@ -39,7 +39,6 @@ Parent Model: For more details about SimCSE, we encourage users to check out the
39
  ### Example of classification of a set of responses:
40
 
41
  ```python
42
-
43
  from transformers import AutoTokenizer, AutoModelForSequenceClassification
44
  import torch
45
  import pandas as pd
@@ -79,6 +78,35 @@ print(df)
79
  |Necessity| 0.001| 0.001| 0.002| 0.980| 0.016|
80
  |My job went remote and I needed to take care of my kids| 0.000| 0.000| 0.000| 0.000| 1.000|
81
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
82
  ## Uses
83
 
84
  ### Direct Uses
@@ -95,7 +123,7 @@ This model is intended to be used on survey responses for data cleaning to help
95
 
96
  ## Misuses and Out-of-scope Use
97
 
98
- The model has been trained to specifically identify survey non-response in open ended responses where the respondent taking the survey has given a response but their answer does not respond to the question at hand or providing any meaningful insight. Some examples of these types of responses are `meow`, `ksdhfkshgk`, or `idk`. The model was finetuned on 3,000 labeled open-ended responses to web probes on questions relating to the COVID-19 pandemic gathered from the [Research and Development Survey or RANDS](https://www.cdc.gov/nchs/rands/index.htm) conducted by the Division of Research and Methodology at the National Center for Health Statistics. Web probes are questions implementing probing techniques from cognitive interviewing for use in survey question design and are different than traditional open-ended survey questions. The context of our labeled responses limited in focus on both COVID and health responses, so responses outside this scope may notice a drop in performance.
99
 
100
  The responses the model is trained on are also from both web and phone based open-ended probes. There may be limitations in model effectiveness with more traditional open ended survey questions with responses provided in other mediums.
101
 
@@ -114,7 +142,7 @@ Some examples of refusal responses also can appear to be valid as they did not o
114
 
115
  #### Training Data
116
 
117
- The model was finetuned on 3,000 labeled open-ended responses from [RANDS during COVID 19 Rounds 1 and 2](https://www.cdc.gov/nchs/rands/index.htm). The base SimCSE BERT model was trained on BookCorpus and English Wikipedia.
118
 
119
  #### Training procedure
120
 
 
39
  ### Example of classification of a set of responses:
40
 
41
  ```python
 
42
  from transformers import AutoTokenizer, AutoModelForSequenceClassification
43
  import torch
44
  import pandas as pd
 
78
  |Necessity| 0.001| 0.001| 0.002| 0.980| 0.016|
79
  |My job went remote and I needed to take care of my kids| 0.000| 0.000| 0.000| 0.000| 1.000|
80
 
81
+
82
+ Alternatively, you can load the model using a pipeline
83
+
84
+ ```python
85
+ from transformers import pipeline
86
+ pipe = pipeline("text-classification", "NCHS/SANDS")
87
+ print( pipe(responses) )
88
+ ```
89
+
90
+ ```python
91
+ [{'label': 'Gibberish', 'score': 0.9978908896446228},
92
+ {'label': 'Uncertainty', 'score': 0.9950007796287537},
93
+ {'label': 'Refusal', 'score': 0.9775006771087646},
94
+ {'label': 'High-risk', 'score': 0.9804121255874634},
95
+ {'label': 'Valid', 'score': 0.9997561573982239}]
96
+ ```
97
+
98
+ With the pipeline set `top_k` to see all the full output:
99
+
100
+ ```python
101
+ pipe(responses, top_k=5)
102
+ ```
103
+
104
+ Finally, if you'd like to use a local GPU set the device to the GPU number (usually 0).
105
+
106
+ ```python
107
+ pipe = pipeline("text-classification", "NCHS/SANDS", device=0)
108
+ ```
109
+
110
  ## Uses
111
 
112
  ### Direct Uses
 
123
 
124
  ## Misuses and Out-of-scope Use
125
 
126
+ The model has been trained to specifically identify survey non-response in open ended responses where the respondent taking the survey has given a response but their answer does not respond to the question at hand or providing any meaningful insight. Some examples of these types of responses are `meow`, `ksdhfkshgk`, or `idk`. The model was fine-tuned on 3,000 labeled open-ended responses to web probes on questions relating to the COVID-19 pandemic gathered from the [Research and Development Survey or RANDS](https://www.cdc.gov/nchs/rands/index.htm) conducted by the Division of Research and Methodology at the National Center for Health Statistics. Web probes are questions implementing probing techniques from cognitive interviewing for use in survey question design and are different than traditional open-ended survey questions. The context of our labeled responses limited in focus on both COVID and health responses, so responses outside this scope may notice a drop in performance.
127
 
128
  The responses the model is trained on are also from both web and phone based open-ended probes. There may be limitations in model effectiveness with more traditional open ended survey questions with responses provided in other mediums.
129
 
 
142
 
143
  #### Training Data
144
 
145
+ The model was fine-tuned on 3,000 labeled open-ended responses from [RANDS during COVID 19 Rounds 1 and 2](https://www.cdc.gov/nchs/rands/index.htm). The base SimCSE BERT model was trained on BookCorpus and English Wikipedia.
146
 
147
  #### Training procedure
148