dsrestrepo commited on
Commit
cf9debb
·
1 Parent(s): 2ac3d31

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +48 -0
README.md CHANGED
@@ -1,3 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
  ---
 
 
1
+ # Model Details
2
+
3
+ ##### Model Name: NumericBERT
4
+
5
+ ##### Model Type: Transformer
6
+
7
+ ##### Architecture: BERT
8
+
9
+ ##### Training Method: Masked Language Modeling (MLM)
10
+
11
+ ##### Training Data: MIMIC IV Lab values data
12
+
13
+ ##### Training Hyperparameters:
14
+
15
+ Optimizer: AdamW
16
+ Learning Rate: 5e-5
17
+ Masking Rate: 20%
18
+ Tokenization
19
+ Tokenizer: Custom numeric-to-text mapping using the TextEncoder class
20
+
21
+ ### Text Encoding Process:
22
+
23
+ The process converts non-negative integers into uppercase letter-based representations. This mapping allows numerical values to be expressed as sequences of letters.
24
+ Subsequently, a method is applied to scale numerical values and convert them into corresponding letters based on a predefined mapping.
25
+ Finally, a text encoding is executed to add the corresponding lab ID using the numeric values in specified columns ('Bic', 'Crt', 'Pot', 'Sod', 'Ure', 'Hgb', 'Plt', 'Wbc').
26
+
27
+
28
+ ### Training Data Preprocessing
29
+ Column Selection: Numerical values from the following lab values represented as: 'Bic', 'Crt', 'Pot', 'Sod', 'Ure', 'Hgb', 'Plt', 'Wbc'.
30
+ Text Encoding: The numeric values are encoded into text.
31
+ Masking: 20% of the data is randomly masked during training.
32
+
33
+ ### Model Output
34
+ The model outputs predictions for masked values during training.
35
+ The output contains the encoded text.
36
+
37
+ ### Limitations and Considerations
38
+ Numeric Data Representation: The model relies on a custom text representation of numeric data, which might have limitations in capturing complex patterns present in the original numeric data.
39
+ Training Data Source: The model is trained on MIMIC IV numeric data, and its performance might be influenced by the characteristics and biases present in that dataset.
40
+
41
+ ### Contact Information
42
+ For inquiries or additional information, please contact:
43
+
44
+ David Restrepo
45
46
+ MIT Critical Data
47
+
48
  ---
49
  license: mit
50
  ---
51
+