NCHS
/

thoppe commited on
Commit
2f9ab22
·
1 Parent(s): 7fc4f15

Update README from local repo

Browse files
Files changed (1) hide show
  1. README.md +8 -7
README.md CHANGED
@@ -4,6 +4,7 @@ language:
4
  tags:
5
  - text-classification
6
  license: apache-2.0
 
7
  widget:
8
  - text: "sdfsdfa"
9
  example_title: "Gibberish"
@@ -18,13 +19,13 @@ widget:
18
  ---
19
 
20
  # SANDS
21
- _Semi-Automated Non-response Detection for Surveys model (uncased)_
22
 
23
- Non-response detection designed to be used for open-ended survey responses in conjunction with human reviewers.
24
 
25
  ## Model Details
26
 
27
- Model Description: This model is a fine-tuned version of the supervised SimCSE BERT base uncased model. It was introduced at [AAPOR](https://www.aapor.org/) 2022 at the talk _Toward a Semi-automated item nonresponse detector model for open-response data_. The model is uncased, so it does not treats `important`, `Important`, and `ImPoRtAnT` the same.
28
 
29
  * Developed by: [National Center for Health Statistics](https://www.cdc.gov/nchs/index.htm), Centers for Disease Control and Prevention
30
  * Model Type: Text Classification
@@ -87,16 +88,16 @@ This model is intended to be used on survey responses for data cleaning to help
87
  ### Response types
88
 
89
  + **Gibberish**: Nonsensical response where the respondent entered text without regard for English syntax. Examples: `ksdhfkshgk` and `sadsadsadsadsadsadsad`
90
- + **Refusal**: Responses that with valid English but are either a direct refusal to answer the question asked or a response that provides no contextual relationship to the question asked. Examples: `Because` or `Meow`.
91
  + **Uncertainty**: Responses where the respondent does not understand the question, does not know the answer to the question, or does not know how to respond to the question. Examples: `I dont know` or `unsure what you are asking`.
92
  + **High-Risk**: Responses that may be valid depending on the context and content of the question. These responses require human subject matter expertise to classify as a valid response or not. Examples: `Necessity` or `Just isolating`
93
  + **Valid**: Responses that answer the question at hand and provide an insight to the respondents thought on the subject matter of the question. Examples: `COVID began for me when my children’s school went online and I needed to stay home to watch them` or `staying home, avoiding crowds, still wear masks`
94
 
95
  ## Misuses and Out-of-scope Use
96
 
97
- The model has been trained to identify survey non-response in open ended responses, or junk responses , where the respondent taking the survey has given a response but their answer does not respond to the question at hand or providing any meaningful insight such as `meow`, `ksdhfkshgk`, or `idk`. The model was finetuned on 3,000 labeled open-ended responses to web probes on questions relating to the COVID-19 pandemic gathered from the [Research and Development Survey or RANDS](https://www.cdc.gov/nchs/rands/index.htm) conducted by the Division of Research and Methodology at the National Center for Health Statistics. Web probes are questions implementing probing techniques from cognitive interviewing for use in survey question design and are different than traditional open-ended survey questions. The context of our labeled responses limited in focus on both COVID and health responses, so responses outside this scope may notice a drop in performance.
98
 
99
- The responses are also trained from both web and phone based open-ended probes. There may be limitations in model effectiveness with more traditional open ended survey questions with responses provided in other mediums.
100
 
101
  This model does not assess the factual accuracy of responses or filter out responses with different demographic biases. It was not trained to be factual of people or events and so using the model for such classification is out of scope for the abilities of the model.
102
 
@@ -121,4 +122,4 @@ The model was finetuned on 3,000 labeled open-ended responses from [RANDS during
121
  + Batch size: 16
122
  + Number training epochs: 4
123
  + Base Model pooling dimension: 768
124
- + Number of labels: 5
 
4
  tags:
5
  - text-classification
6
  license: apache-2.0
7
+ library: Transformers
8
  widget:
9
  - text: "sdfsdfa"
10
  example_title: "Gibberish"
 
19
  ---
20
 
21
  # SANDS
22
+ _Semi-Automated Non-response Detection for Surveys_
23
 
24
+ Non-response detection designed to be used for open-ended survey text in conjunction with human reviewers.
25
 
26
  ## Model Details
27
 
28
+ Model Description: This model is a fine-tuned version of the supervised SimCSE BERT base uncased model. It was introduced at [AAPOR](https://www.aapor.org/) 2022 at the talk _Toward a Semi-automated item nonresponse detector model for open-response data_. The model is uncased, so it treats `important`, `Important`, and `ImPoRtAnT` the same.
29
 
30
  * Developed by: [National Center for Health Statistics](https://www.cdc.gov/nchs/index.htm), Centers for Disease Control and Prevention
31
  * Model Type: Text Classification
 
88
  ### Response types
89
 
90
  + **Gibberish**: Nonsensical response where the respondent entered text without regard for English syntax. Examples: `ksdhfkshgk` and `sadsadsadsadsadsadsad`
91
+ + **Refusal**: Responses with valid English but are either a direct refusal to answer the question asked or a response that provides no contextual relationship to the question asked. Examples: `Because` or `Meow`.
92
  + **Uncertainty**: Responses where the respondent does not understand the question, does not know the answer to the question, or does not know how to respond to the question. Examples: `I dont know` or `unsure what you are asking`.
93
  + **High-Risk**: Responses that may be valid depending on the context and content of the question. These responses require human subject matter expertise to classify as a valid response or not. Examples: `Necessity` or `Just isolating`
94
  + **Valid**: Responses that answer the question at hand and provide an insight to the respondents thought on the subject matter of the question. Examples: `COVID began for me when my children’s school went online and I needed to stay home to watch them` or `staying home, avoiding crowds, still wear masks`
95
 
96
  ## Misuses and Out-of-scope Use
97
 
98
+ The model has been trained to specifically identify survey non-response in open ended responses where the respondent taking the survey has given a response but their answer does not respond to the question at hand or providing any meaningful insight. Some examples of these types of responses are `meow`, `ksdhfkshgk`, or `idk`. The model was finetuned on 3,000 labeled open-ended responses to web probes on questions relating to the COVID-19 pandemic gathered from the [Research and Development Survey or RANDS](https://www.cdc.gov/nchs/rands/index.htm) conducted by the Division of Research and Methodology at the National Center for Health Statistics. Web probes are questions implementing probing techniques from cognitive interviewing for use in survey question design and are different than traditional open-ended survey questions. The context of our labeled responses limited in focus on both COVID and health responses, so responses outside this scope may notice a drop in performance.
99
 
100
+ The responses the model is trained on are also from both web and phone based open-ended probes. There may be limitations in model effectiveness with more traditional open ended survey questions with responses provided in other mediums.
101
 
102
  This model does not assess the factual accuracy of responses or filter out responses with different demographic biases. It was not trained to be factual of people or events and so using the model for such classification is out of scope for the abilities of the model.
103
 
 
122
  + Batch size: 16
123
  + Number training epochs: 4
124
  + Base Model pooling dimension: 768
125
+ + Number of labels: 5