---
datasets:
- HiTZ/AbstRCT-ES
language:
- es
- en
pipeline_tag: token-classification
---

# Cross-lingual Argument Mining in the Medical Domain

This model is a fine-tuned version of mBERT for the argument mining task using AbstRCT data in English and Spanish.  
The dataset consists of abstracts of 5 disease types for argument component detection and argument relation classification:

- `neoplasm`: 350 train, 100 dev and 50 test abstracts
- `glaucoma_test`: 100 abstracts
- `mixed_test`: 100 abstracts (20 on glaucoma, 20 on neoplasm, 20 on diabetes, 20 on hypertension, 20 on hepatitis) 

The results achieved for each test set: 

   Test   | F1-macro | F1-Claim | F1-Premise 
--|-------|-------|-------
 Neoplasm |   82.36  |  74.89   |   89.07    
 Glaucoma |   80.52  |  75.22   |   84.86     
   Mixed  |   81.69  |  75.06   |   88.57  

You can find more information:

 - 📖 Paper: [Crosslingual Argument Mining in the Medical Domain](https://arxiv.org/abs/2301.10527)
 - Code: [https://github.com/ragerri/abstrct-projections/tree/final](https://github.com/ragerri/abstrct-projections/tree/final) 

You can load the model as follows:

```python
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained('HiTZ/mbert-argument-mining-es')
```   


## Citation

````bibtex
@misc{yeginbergen2024crosslingual,
      title={Cross-lingual Argument Mining in the Medical Domain}, 
      author={Anar Yeginbergen and Rodrigo Agerri},
      year={2024},
      eprint={2301.10527},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
````