ML Wong commited on
Commit
d1e8c9d
·
1 Parent(s): cb9f0a7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -2
README.md CHANGED
@@ -12,13 +12,19 @@ widget:
12
  - text: "Small bilateral cervical [MASK] with unusual distribution."
13
  example_title: "Example 3"
14
  ---
15
- # Intro
16
- This model was built on Microsoft's BERT trained on PubMed uncased database (`microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext`). I have extracted > 400 radiology reports for staging nasopharyngeal carcinoma (NPC). To focus on NPC, incidental findings and unrelated observations are removed prior to training. In addition, the abbreviations for structures were replaced by the original words to facilitate the model of learning suffixes and prefixes that might indicate geographical locations (e.g. L neck -> left neck, IJC -> internal jugular chain).
17
 
18
  A tokenizer was trained based on the original PubMed version, and the radiology reports were used to fine tune the PubMedBert. This fine tuned model has the weakness of unable to identify phrase or multi-word nouns, e.g. "nodal metastatases" is considered two separate words such that the BERT module tends to fill "nodes" when these two words are masked.
19
 
20
  This model serve as a pilot analysis of whether it is possible to adopt a transformer based deep learning for radiology report corpus of NPC.
21
 
 
 
 
 
 
 
22
  # Training Losses
23
  | Epoch | Training Loss | Validation Loss |
24
  |-------|---------------|-----------------|
 
12
  - text: "Small bilateral cervical [MASK] with unusual distribution."
13
  example_title: "Example 3"
14
  ---
15
+ # Background
16
+ This model was built on Microsoft's BERT trained on PubMed uncased database (`microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext`). A number of (~500) radiology reports for staging nasopharyngeal carcinoma (NPC) written in our center by board-certified radiologist were retrospectively retrieved with ethics approval . To focus on NPC, incidental findings and unrelated observations are removed prior to training. In addition, the abbreviations for structures were replaced by the original words to facilitate the model of learning suffixes and prefixes that might indicate geographical locations (e.g. L neck -> left neck, IJC -> internal jugular chain).
17
 
18
  A tokenizer was trained based on the original PubMed version, and the radiology reports were used to fine tune the PubMedBert. This fine tuned model has the weakness of unable to identify phrase or multi-word nouns, e.g. "nodal metastatases" is considered two separate words such that the BERT module tends to fill "nodes" when these two words are masked.
19
 
20
  This model serve as a pilot analysis of whether it is possible to adopt a transformer based deep learning for radiology report corpus of NPC.
21
 
22
+ # Affiliations
23
+ Imaging and Interventional Radiology,
24
+
25
+ Chinese University of Hong Kong
26
+
27
+
28
  # Training Losses
29
  | Epoch | Training Loss | Validation Loss |
30
  |-------|---------------|-----------------|