Update README from local repo
Browse files
README.md
CHANGED
@@ -12,7 +12,7 @@ widget:
|
|
12 |
example_title: "Uncertainty"
|
13 |
- text: "Because you asked"
|
14 |
example_title: "Refusal"
|
15 |
-
- text: "
|
16 |
example_title: "High-risk"
|
17 |
- text: "My job went remote and I needed to take care of my kids"
|
18 |
example_title: "Valid"
|
@@ -32,7 +32,7 @@ Model Description: This model is a fine-tuned version of the supervised SimCSE
|
|
32 |
* Language(s): English
|
33 |
* License: Apache-2.0
|
34 |
|
35 |
-
Parent Model: For more details about SimCSE, we encourage users to check out the SimCSE [Github repository](https://github.com/princeton-nlp/SimCSE),
|
36 |
|
37 |
## How to Get Started with the Model
|
38 |
|
@@ -53,7 +53,7 @@ responses = [
|
|
53 |
"sdfsdfa",
|
54 |
"idkkkkk",
|
55 |
"Because you asked",
|
56 |
-
"
|
57 |
"My job went remote and I needed to take care of my kids",
|
58 |
]
|
59 |
|
@@ -74,8 +74,8 @@ print(df)
|
|
74 |
|--------|---------------|-----------------|-----------|-----------------|-----------|
|
75 |
|sdfsdfa| 0.998| 0.000| 0.000| 0.000| 0.000|
|
76 |
|idkkkkk| 0.002| 0.995| 0.001| 0.001| 0.001|
|
77 |
-
|Because you asked| 0.001| 0.001| 0.
|
78 |
-
|
|
79 |
|My job went remote and I needed to take care of my kids| 0.000| 0.000| 0.000| 0.000| 1.000|
|
80 |
|
81 |
|
@@ -118,7 +118,7 @@ This model is intended to be used on survey responses for data cleaning to help
|
|
118 |
+ **Gibberish**: Nonsensical response where the respondent entered text without regard for English syntax. Examples: `ksdhfkshgk` and `sadsadsadsadsadsadsad`
|
119 |
+ **Refusal**: Responses with valid English but are either a direct refusal to answer the question asked or a response that provides no contextual relationship to the question asked. Examples: `Because` or `Meow`.
|
120 |
+ **Uncertainty**: Responses where the respondent does not understand the question, does not know the answer to the question, or does not know how to respond to the question. Examples: `I dont know` or `unsure what you are asking`.
|
121 |
-
+ **High-Risk**: Responses that may be valid depending on the context and content of the question. These responses require human subject matter expertise to classify as a valid response or not. Examples: `Necessity` or `
|
122 |
+ **Valid**: Responses that answer the question at hand and provide an insight to the respondents thought on the subject matter of the question. Examples: `COVID began for me when my children’s school went online and I needed to stay home to watch them` or `staying home, avoiding crowds, still wear masks`
|
123 |
|
124 |
## Misuses and Out-of-scope Use
|
@@ -134,6 +134,10 @@ We did not train the model to recognize non-response in any language other than
|
|
134 |
|
135 |
## Risks, Limitations, and Biases
|
136 |
|
|
|
|
|
|
|
|
|
137 |
As the model was fine-tuned from SimCSE, itself fine-tuned from BERT, it will reproduce all biases inherent in these base models. Due to tokenization, the model may incorrectly classify typos, especially in acronyms. For example: `LGBTQ` is valid, while `LBGTQ` is classified as gibberish.
|
138 |
|
139 |
## Training
|
|
|
12 |
example_title: "Uncertainty"
|
13 |
- text: "Because you asked"
|
14 |
example_title: "Refusal"
|
15 |
+
- text: "I am a cucumber"
|
16 |
example_title: "High-risk"
|
17 |
- text: "My job went remote and I needed to take care of my kids"
|
18 |
example_title: "Valid"
|
|
|
32 |
* Language(s): English
|
33 |
* License: Apache-2.0
|
34 |
|
35 |
+
Parent Model: For more details about SimCSE, we encourage users to check out the SimCSE [Github repository](https://github.com/princeton-nlp/SimCSE), and the [base model](https://huggingface.co/princeton-nlp/sup-simcse-bert-base-uncased) on HuggingFace.
|
36 |
|
37 |
## How to Get Started with the Model
|
38 |
|
|
|
53 |
"sdfsdfa",
|
54 |
"idkkkkk",
|
55 |
"Because you asked",
|
56 |
+
"I am a cucumber",
|
57 |
"My job went remote and I needed to take care of my kids",
|
58 |
]
|
59 |
|
|
|
74 |
|--------|---------------|-----------------|-----------|-----------------|-----------|
|
75 |
|sdfsdfa| 0.998| 0.000| 0.000| 0.000| 0.000|
|
76 |
|idkkkkk| 0.002| 0.995| 0.001| 0.001| 0.001|
|
77 |
+
|Because you asked| 0.001| 0.001| 0.976| 0.006| 0.014|
|
78 |
+
|I am a cucumber| 0.001| 0.001| 0.002| 0.797| 0.178|
|
79 |
|My job went remote and I needed to take care of my kids| 0.000| 0.000| 0.000| 0.000| 1.000|
|
80 |
|
81 |
|
|
|
118 |
+ **Gibberish**: Nonsensical response where the respondent entered text without regard for English syntax. Examples: `ksdhfkshgk` and `sadsadsadsadsadsadsad`
|
119 |
+ **Refusal**: Responses with valid English but are either a direct refusal to answer the question asked or a response that provides no contextual relationship to the question asked. Examples: `Because` or `Meow`.
|
120 |
+ **Uncertainty**: Responses where the respondent does not understand the question, does not know the answer to the question, or does not know how to respond to the question. Examples: `I dont know` or `unsure what you are asking`.
|
121 |
+
+ **High-Risk**: Responses that may be valid depending on the context and content of the question. These responses require human subject matter expertise to classify as a valid response or not. Examples: `Necessity` or `I am a cucumber`
|
122 |
+ **Valid**: Responses that answer the question at hand and provide an insight to the respondents thought on the subject matter of the question. Examples: `COVID began for me when my children’s school went online and I needed to stay home to watch them` or `staying home, avoiding crowds, still wear masks`
|
123 |
|
124 |
## Misuses and Out-of-scope Use
|
|
|
134 |
|
135 |
## Risks, Limitations, and Biases
|
136 |
|
137 |
+
To investigate if there were differences between demographic groups on sensitivity and specificity, we conducted two-tailed Z-tests across demographic groups. These included education (some college or less and bachelor’s or more), sex (male or female), mode (computer or telephone), race and ethnicity (non-Hispanic White, non-Hispanic Black, Hispanic, and all others who are non-Hispanic), and age (18-29, 30-44, 45-59, and 60+). There were 4,813 responses to 3 probes. To control for family-wise error rate, we applied the Bonferroni correction was applied to the alpha level (α < 0.00167).
|
138 |
+
|
139 |
+
There were statistically significant differences in specificity between education levels, mode, and White and Black respondents. There were no statistically significant differences in sensitivity. Respondents with some college or less had lower specificity compared to those with more education (0.73 versus 0.80, p < 0.0001). Respondents who used a smartphone or computer to complete their survey had a higher specificity than those who completed the survey over the telephone (0.77 versus 0.70, p < 0.0001). Black respondents had a lower specificity than White respondents (0.65 versus 0.78, p < 0.0001). Effect sizes for education and mode were small (h = 0.17 and h = 0.16, respectively) while the effect size for race was between small and medium (h = 0.28).
|
140 |
+
|
141 |
As the model was fine-tuned from SimCSE, itself fine-tuned from BERT, it will reproduce all biases inherent in these base models. Due to tokenization, the model may incorrectly classify typos, especially in acronyms. For example: `LGBTQ` is valid, while `LBGTQ` is classified as gibberish.
|
142 |
|
143 |
## Training
|