File size: 3,396 Bytes
995c906
 
 
 
 
 
 
 
 
 
 
 
 
 
58f60cc
995c906
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57439d5
995c906
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
df709f2
995c906
 
 
 
df709f2
995c906
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
---
license: apache-2.0
language:
- es
pipeline_tag: text-classification
tags:
- sentence-transformers
- text-classification
- bert
- biomedical
- lexical semantics
- bionlp
---

# Biomedical term classifier with Transformers in Spanish

## Table of contents
<details>
<summary>Click to expand</summary>

- [Model description](#model-description)
- [Intended uses and limitations](#intended-use)
- [How to use](#how-to-use)
- [Training](#training)
- [Evaluation](#evaluation)
- [Additional information](#additional-information)
  - [Author](#author)
  - [Licensing information](#licensing-information)
  - [Citation information](#citation-information)
  - [Disclaimer](#disclaimer)

</details>

## Model description
This is a Transformer's [AutoModelForSequenceClassification](https://huggingface.co/docs/transformers/model_doc/auto#transformers.AutoModelForSequenceClassification) trained for multilabel biomedical text classification in Spanish. 

## Intended uses and limitations
The model is prepared to classify medical entities among 21 classes, including diseases, medical procedures, symptoms, and drugs, among others. It still lacks some classes like body structures.

## How to use
This model is implemented as part of the KeyCARE library. Install first the keycare module to call the Transformer classifier:

```bash
python -m pip install keycare
```

You can then run the KeyCARE pipeline that uses the SetFit model:

```python
from keycare install TermExtractor.TermExtractor

# initialize the termextractor object
termextractor = TermExtractor(categorization_method='transformers')
# Run the pipeline
text = """Acude al Servicio de Urgencias por cefalea frontoparietal derecha.
Mediante biopsia se diagnostica adenocarcinoma de pr贸stata Gleason 4+4=8 con met谩stasis 贸seas m煤ltiples.
Se trata con 脕cido Zoledr贸nico 4 mg iv/4 semanas.
"""
termextractor(text)
# You can also access the class storing the Transformer model
categorizer = termextractor.categorizer
```

## Training
The used pre-trained model is SapBERT-from-roberta-base-biomedical-clinical-es from the BSC-NLP4BIA reserch group. The model has been trained using data obtained from NER Gold Standard Corpora also generated by BSC-NLP4BIA, including [MedProcNER](https://temu.bsc.es/medprocner/), [DISTEMIST](https://temu.bsc.es/distemist/), [SympTEMIST](https://temu.bsc.es/symptemist/), [CANTEMIST](https://temu.bsc.es/cantemist/), and [PharmaCoNER](https://temu.bsc.es/pharmaconer/), among others.

## Evaluation
To be published

## Additional information

### Author
NLP4BIA at the Barcelona Supercomputing Center

### Licensing information
[Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0)

### Citation information
To be published

### Disclaimer
<details>
<summary>Click to expand</summary>

The models published in this repository are intended for a generalist purpose and are available to third parties. These models may have bias and/or any other undesirable distortions.

When third parties, deploy or provide systems and/or services to other parties using any of these models (or using systems based on these models) or become users of the models, they should note that it is their responsibility to mitigate the risks arising from their use and, in any event, to comply with applicable regulations, including regulations regarding the use of Artificial Intelligence.

</details>