ML Wong commited on
Commit
1ee1ecc
·
1 Parent(s): 146d7fa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -7,12 +7,12 @@ license: "mit"
7
  widget:
8
  - text: "Nasopharyngeal carcinoma confined in the [MASK]."
9
  example_title: "Example 1"
10
- - text: "Nodal metastases in the left side of the [MASK]"
11
  example_title: "Example 2"
12
- - text: "The [MASK] infiltrated the clivus."
13
  example_title: "Example 3"
14
  ---
15
  # Intro
16
  This model was built on Microsoft's BERT trained on PubMed uncased database (`microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext`). I have extracted > 400 radiology reports for staging nasopharyngeal carcinoma (NPC). To focus on NPC, incidental findings and unrelated observations are removed.
17
 
18
- A tokenizer was trained based on the original PubMed version, and the radiology reports were used to fine tune the PubMedBert.
 
7
  widget:
8
  - text: "Nasopharyngeal carcinoma confined in the [MASK]."
9
  example_title: "Example 1"
10
+ - text: "Nodal metastases in the left side of the [MASK]."
11
  example_title: "Example 2"
12
+ - text: "Small bilateral cervical [MASK] with unusual distribution."
13
  example_title: "Example 3"
14
  ---
15
  # Intro
16
  This model was built on Microsoft's BERT trained on PubMed uncased database (`microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext`). I have extracted > 400 radiology reports for staging nasopharyngeal carcinoma (NPC). To focus on NPC, incidental findings and unrelated observations are removed.
17
 
18
+ A tokenizer was trained based on the original PubMed version, and the radiology reports were used to fine tune the PubMedBert. This fine tuned model has the weakness of unable to identify phrase or multi-word nouns, e.g. "nodal metastatases" is considered two separate words such that the BERT module tends to fill "nodes" when these two words are masked.