lsacy commited on
Commit
b16d24c
·
1 Parent(s): 456e494

Upload 4 files

Browse files
Files changed (4) hide show
  1. README.md +67 -0
  2. config.json +36 -0
  3. pytorch_model.bin +3 -0
  4. vocab.txt +0 -0
README.md ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: "en"
3
+ tags:
4
+ - bert
5
+ - medical
6
+ - clinical
7
+ - assertion
8
+ - negation
9
+ - text-classification
10
+ widget:
11
+ - text: "Patient denies [entity] SOB [entity]."
12
+
13
+ ---
14
+
15
+ # Clinical Assertion / Negation Classification BERT
16
+
17
+ ## Model description
18
+
19
+ The Clinical Assertion and Negation Classification BERT is introduced in the paper [Assertion Detection in Clinical Notes: Medical Language Models to the Rescue?
20
+ ](https://aclanthology.org/2021.nlpmc-1.5/). The model helps structure information in clinical patient letters by classifying medical conditions mentioned in the letter into PRESENT, ABSENT and POSSIBLE.
21
+
22
+ The model is based on the [ClinicalBERT - Bio + Discharge Summary BERT Model](https://huggingface.co/emilyalsentzer/Bio_Discharge_Summary_BERT) by Alsentzer et al. and fine-tuned on assertion data from the [2010 i2b2 challenge](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3168320/).
23
+
24
+
25
+ #### How to use the model
26
+
27
+ You can load the model via the transformers library:
28
+ ```
29
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification, TextClassificationPipeline
30
+ tokenizer = AutoTokenizer.from_pretrained("bvanaken/clinical-assertion-negation-bert")
31
+ model = AutoModelForSequenceClassification.from_pretrained("bvanaken/clinical-assertion-negation-bert")
32
+
33
+ ```
34
+
35
+ The model expects input in the form of spans/sentences with one marked entity to classify as `PRESENT(0)`, `ABSENT(1)` or `POSSIBLE(2)`. The entity in question is identified with the special token `[entity]` surrounding it.
36
+
37
+ Example input and inference:
38
+ ```
39
+ input = "The patient recovered during the night and now denies any [entity] shortness of breath [entity]."
40
+
41
+ classifier = TextClassificationPipeline(model=model, tokenizer=tokenizer)
42
+
43
+ classification = classifier(input)
44
+ # [{'label': 'ABSENT', 'score': 0.9842607378959656}]
45
+ ```
46
+
47
+ ### Cite
48
+
49
+ When working with the model, please cite our paper as follows:
50
+
51
+ ```bibtex
52
+ @inproceedings{van-aken-2021-assertion,
53
+ title = "Assertion Detection in Clinical Notes: Medical Language Models to the Rescue?",
54
+ author = "van Aken, Betty and
55
+ Trajanovska, Ivana and
56
+ Siu, Amy and
57
+ Mayrdorfer, Manuel and
58
+ Budde, Klemens and
59
+ Loeser, Alexander",
60
+ booktitle = "Proceedings of the Second Workshop on Natural Language Processing for Medical Conversations",
61
+ year = "2021",
62
+ address = "Online",
63
+ publisher = "Association for Computational Linguistics",
64
+ url = "https://aclanthology.org/2021.nlpmc-1.5",
65
+ doi = "10.18653/v1/2021.nlpmc-1.5"
66
+ }
67
+ ```
config.json ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertForSequenceClassification"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "finetuning_task": "text_classification",
7
+ "gradient_checkpointing": false,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 768,
11
+ "id2label": {
12
+ "0": "PRESENT",
13
+ "1": "ABSENT",
14
+ "2": "POSSIBLE"
15
+ },
16
+ "initializer_range": 0.02,
17
+ "intermediate_size": 3072,
18
+ "label2id": {
19
+ "PRESENT": 0,
20
+ "ABSENT": 1,
21
+ "POSSIBLE": 2
22
+ },
23
+ "language": "english",
24
+ "layer_norm_eps": 1e-12,
25
+ "max_position_embeddings": 512,
26
+ "model_type": "bert",
27
+ "name": "Bert",
28
+ "num_attention_heads": 12,
29
+ "num_hidden_layers": 12,
30
+ "pad_token_id": 0,
31
+ "position_embedding_type": "absolute",
32
+ "transformers_version": "4.6.1",
33
+ "type_vocab_size": 2,
34
+ "use_cache": true,
35
+ "vocab_size": 28997
36
+ }
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a5eb2077bb4192ba2ef24496c24b6c15fd2c7cc6d332fdb07170f4d602658221
3
+ size 433339913
vocab.txt ADDED
The diff for this file is too large to render. See raw diff