ML Wong
commited on
Commit
·
1ee1ecc
1
Parent(s):
146d7fa
Update README.md
Browse files
README.md
CHANGED
@@ -7,12 +7,12 @@ license: "mit"
|
|
7 |
widget:
|
8 |
- text: "Nasopharyngeal carcinoma confined in the [MASK]."
|
9 |
example_title: "Example 1"
|
10 |
-
- text: "Nodal metastases in the left side of the [MASK]"
|
11 |
example_title: "Example 2"
|
12 |
-
- text: "
|
13 |
example_title: "Example 3"
|
14 |
---
|
15 |
# Intro
|
16 |
This model was built on Microsoft's BERT trained on PubMed uncased database (`microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext`). I have extracted > 400 radiology reports for staging nasopharyngeal carcinoma (NPC). To focus on NPC, incidental findings and unrelated observations are removed.
|
17 |
|
18 |
-
A tokenizer was trained based on the original PubMed version, and the radiology reports were used to fine tune the PubMedBert.
|
|
|
7 |
widget:
|
8 |
- text: "Nasopharyngeal carcinoma confined in the [MASK]."
|
9 |
example_title: "Example 1"
|
10 |
+
- text: "Nodal metastases in the left side of the [MASK]."
|
11 |
example_title: "Example 2"
|
12 |
+
- text: "Small bilateral cervical [MASK] with unusual distribution."
|
13 |
example_title: "Example 3"
|
14 |
---
|
15 |
# Intro
|
16 |
This model was built on Microsoft's BERT trained on PubMed uncased database (`microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext`). I have extracted > 400 radiology reports for staging nasopharyngeal carcinoma (NPC). To focus on NPC, incidental findings and unrelated observations are removed.
|
17 |
|
18 |
+
A tokenizer was trained based on the original PubMed version, and the radiology reports were used to fine tune the PubMedBert. This fine tuned model has the weakness of unable to identify phrase or multi-word nouns, e.g. "nodal metastatases" is considered two separate words such that the BERT module tends to fill "nodes" when these two words are masked.
|