mlwong commited on
Commit
da28ad0
·
verified ·
1 Parent(s): bc21d9b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -57
README.md CHANGED
@@ -14,11 +14,11 @@ widget:
14
  ---
15
  # **IMPORTANT**
16
 
17
- **This is an outdated model, please see my [space](https://huggingface.co/spaces/mlwong/npc-bert-demo) for a more updated version.**
18
 
19
  ---
20
 
21
- # Background
22
  This model was built on Microsoft's BERT trained on PubMed uncased database (`microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext`). A number of (~500) radiology reports for staging nasopharyngeal carcinoma (NPC) written in our center by board-certified radiologist were retrospectively retrieved with ethics approval . To focus on NPC, incidental findings and unrelated observations are removed prior to training. In addition, the abbreviations for structures were replaced by the original words to facilitate the model of learning suffixes and prefixes that might indicate geographical locations (e.g. L neck -> left neck, IJC -> internal jugular chain).
23
 
24
  A tokenizer was trained based on the original PubMed version, and the radiology reports were used to fine tune the PubMedBert. This fine tuned model has the weakness of unable to identify phrase or multi-word nouns, e.g. "nodal metastatases" is considered two separate words such that the BERT module tends to fill "nodes" when these two words are masked.
@@ -29,58 +29,3 @@ This model serve as a pilot analysis of whether it is possible to adopt a transf
29
  Imaging and Interventional Radiology,
30
 
31
  Chinese University of Hong Kong
32
-
33
-
34
- # Training Losses
35
- | Epoch | Training Loss | Validation Loss |
36
- |-------|---------------|-----------------|
37
- | 1 | No log | 3.474347 |
38
- | 2 | No log | 3.174083 |
39
- | 3 | No log | 2.944307 |
40
- | 4 | No log | 2.674384 |
41
- | 5 | No log | 2.574261 |
42
- | 6 | No log | 2.390012 |
43
- | 7 | No log | 2.209419 |
44
- | 8 | 2.464700 | 2.107448 |
45
- | 9 | 2.464700 | 1.974744 |
46
- | 10 | 2.464700 | 1.841606 |
47
- | 11 | 2.464700 | 1.783265 |
48
- | 12 | 2.464700 | 1.674914 |
49
- | 13 | 2.464700 | 1.572721 |
50
- | 14 | 2.464700 | 1.546106 |
51
- | 15 | 2.464700 | 1.507173 |
52
- | 16 | 1.153500 | 1.445264 |
53
- | 17 | 1.153500 | 1.394671 |
54
- | 18 | 1.153500 | 1.345976 |
55
- | 19 | 1.153500 | 1.312650 |
56
- | 20 | 1.153500 | 1.256743 |
57
- | 21 | 1.153500 | 1.233211 |
58
- | 22 | 1.153500 | 1.213525 |
59
- | 23 | 1.153500 | 1.182824 |
60
- | 24 | 0.681100 | 1.164411 |
61
- | 25 | 0.681100 | 1.128899 |
62
- | 26 | 0.681100 | 1.145166 |
63
- | 27 | 0.681100 | 1.079617 |
64
- | 28 | 0.681100 | 1.087909 |
65
- | 29 | 0.681100 | 1.102839 |
66
- | 30 | 0.681100 | 1.066386 |
67
- | 31 | 0.681100 | 1.094807 |
68
- | 32 | 0.478400 | 1.060072 |
69
- | 33 | 0.478400 | 1.016879 |
70
- | 34 | 0.478400 | 0.999808 |
71
- | 35 | 0.478400 | 0.987576 |
72
- | 36 | 0.478400 | 1.011713 |
73
- | 37 | 0.478400 | 0.996884 |
74
- | 38 | 0.478400 | 1.018533 |
75
- | 39 | 0.478400 | 1.015250 |
76
- | 40 | 0.378400 | 0.945075 |
77
- | 41 | 0.378400 | 0.950782 |
78
- | 42 | 0.378400 | 1.004242 |
79
- | 43 | 0.378400 | 0.984930 |
80
- | 44 | 0.378400 | 0.966999 |
81
- | 45 | 0.378400 | 0.988593 |
82
- | 46 | 0.378400 | 0.970504 |
83
- | 47 | 0.378400 | 0.976804 |
84
- | 48 | 0.339400 | 1.001518 |
85
- | 49 | 0.339400 | 0.986024 |
86
- | 50 | 0.339400 | 0.987911 |
 
14
  ---
15
  # **IMPORTANT**
16
 
17
+ **>>> This is an outdated model, please see my [space](https://huggingface.co/spaces/mlwong/npc-bert-demo) for a more updated version. <<<**
18
 
19
  ---
20
 
21
+ # Background--
22
  This model was built on Microsoft's BERT trained on PubMed uncased database (`microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext`). A number of (~500) radiology reports for staging nasopharyngeal carcinoma (NPC) written in our center by board-certified radiologist were retrospectively retrieved with ethics approval . To focus on NPC, incidental findings and unrelated observations are removed prior to training. In addition, the abbreviations for structures were replaced by the original words to facilitate the model of learning suffixes and prefixes that might indicate geographical locations (e.g. L neck -> left neck, IJC -> internal jugular chain).
23
 
24
  A tokenizer was trained based on the original PubMed version, and the radiology reports were used to fine tune the PubMedBert. This fine tuned model has the weakness of unable to identify phrase or multi-word nouns, e.g. "nodal metastatases" is considered two separate words such that the BERT module tends to fill "nodes" when these two words are masked.
 
29
  Imaging and Interventional Radiology,
30
 
31
  Chinese University of Hong Kong