nlp4good
/

psych-search

@@ -20,23 +20,31 @@ This model is an extension of [allenai/scibert_scivocab_uncased](https://hugging
 ```python
 from transformers import AutoTokenizer, AutoModel
-mname = "datawrestler/psych-search"
 tokenizer = AutoTokenizer.from_pretrained(mname)
 model = AutoModel.from_pretrained(mname)
 ```
-#### Limitations and bias
-This model was trained on all PubMed abstracts categorized under [Psychology and Psychiatry](https://meshb.nlm.nih.gov/treeView). As of March 1, this corresponded to approximately 3.2 million papers that contained abstract text. Of these 3.2 million papers, relevant sparse categories were back translated to increase the representation of sparser mental health categories. This included backtranslating the following:
 ## Training data
-This model was trained on all PubMed abstracts categorized under [Psychology and Psychiatry](https://meshb.nlm.nih.gov/treeView). As of March 1, this corresponded to approximately 3.2 million papers that contained abstract text. Of these 3.2 million papers, relevant sparse categories were back translated from english to french and from french to english to increase the representation of sparser mental health categories. This included backtranslating the following papers with the following categories:
-- Female
-- Adult
-- Middle Aged
 - Depressive Disorder
 - Risk Factors
 - Mental Disorders
@@ -47,11 +55,11 @@ In aggregate, this process added 557,980 additional papers to our training data.
 ## Training procedure
-Continued pretraining was on Psychology and Psychiatry PubMed papers for 10 epochs. Default parameters were used with the exception of gradient accumulation steps which was set at 4, with a per device train batch size of 32. 2 x Nvidia 3090's were used in the development of this model.
-## Eval results
-To evaluate the utility of psych-search within the mental health domain, an evaluation task was constructed by finetuning psych-search for a task similar to [BioASQ Task A](http://bioasq.org/). Here we perform large scale biomedical indexing using the MESH taxonomy associated with each paper underneath Psychology and Psychiatry. The evaluation metric is the micro F1 score across all second level descriptors under Psychology and Psychiatry. This corresponds to 38 different MESH categories used during evaluation.
 bert-base-uncased   | SciBERT Scivocab Uncased | Psych-Search
 -------|---------|----------

 ```python
 from transformers import AutoTokenizer, AutoModel
+mname = "nlp4good/psych-search"
 tokenizer = AutoTokenizer.from_pretrained(mname)
 model = AutoModel.from_pretrained(mname)
 ```
+### Limitations and bias
+This model was trained on all PubMed abstracts categorized under [Psychology and Psychiatry](https://meshb.nlm.nih.gov/treeView). As of March 1, this corresponds to approximately 3.2 million papers that contains abstract text. Of these 3.2 million papers, relevant sparse mental health categories were back translated to increase the representation of certain mental health categories.
+There are several limitation with this dataset including large discrepancies in the number of papers associated with [Sexual and Gender Minorities](https://meshb.nlm.nih.gov/record/ui?ui=D000072339). The training data consisted of the following breakdown across gender groups:
+Female | Male | Sexual and Gender Minorities
+-------|---------|----------
+1,896,301 | 1,945,279 | 4,529
+Similar discrepancies are present within [Ethnic Groups](https://meshb.nlm.nih.gov/record/ui?ui=D005006) as defined within the MESH taxonomy:
+| African Americans | Arabs | Asian Americans | Hispanic Americans | Indians, Central American | Indians, North American | Indians, South American | Indigenous Peoples | Mexican Americans |
+|-------------------|-------|-----------------|--------------------|---------------------------|-------------------------|-------------------------|--------------------|-------------------|
+| 31,027            | 2,437 | 5,612           | 18,893             | 124                       | 5,657                   | 633                     | 174                | 3,234             |
+These discrepancies can have a significant impact on information retrieval systems, downstream  machine learning models, and other forms of NLP that leverage these pretrained models.
 ## Training data
+This model was trained on all PubMed abstracts categorized under [Psychology and Psychiatry](https://meshb.nlm.nih.gov/treeView). As of March 1, this corresponds to approximately 3.2 million papers that contains abstract text. Of these 3.2 million papers, relevant sparse categories were back translated from english to french and from french to english to increase the representation of sparser mental health categories. This included backtranslating the following papers with the following categories:
 - Depressive Disorder
 - Risk Factors
 - Mental Disorders
 ## Training procedure
+Continued pretraining was done on Psychology and Psychiatry PubMed papers for 10 epochs. Default parameters were used with the exception of gradient accumulation steps which was set at 4, with a per device train batch size of 32. 2 x Nvidia 3090's were used in the development of this model.
+## Evaluation results
+To evaluate the effectiveness of psych-search within the mental health domain, an evaluation task was constructed by finetuning psych-search for a task similar to [BioASQ Task A](http://bioasq.org/). Here we perform large scale biomedical indexing using the MESH taxonomy associated with each paper underneath Psychology and Psychiatry. The evaluation metric is the micro F1 score across all second level descriptors within Psychology and Psychiatry. This corresponds to 38 different MESH categories used during evaluation.
 bert-base-uncased   | SciBERT Scivocab Uncased | Psych-Search
 -------|---------|----------