--- datasets: - HiTZ/AbstRCT-ES language: - es - en pipeline_tag: token-classification --- # Cross-lingual Argument Mining in the Medical Domain This model is a fine-tuned version of mBERT for the argument mining task using AbstRCT data in English and Spanish. The dataset consists of abstracts of 5 disease types for argument component detection and argument relation classification: - `neoplasm`: 350 train, 100 dev and 50 test abstracts - `glaucoma_test`: 100 abstracts - `mixed_test`: 100 abstracts (20 on glaucoma, 20 on neoplasm, 20 on diabetes, 20 on hypertension, 20 on hepatitis) The results achieved for each test set: Test | F1-macro | F1-Claim | F1-Premise --|-------|-------|------- Neoplasm | 82.36 | 74.89 | 89.07 Glaucoma | 80.52 | 75.22 | 84.86 Mixed | 81.69 | 75.06 | 88.57 You can find more information: - 📖 Paper: [Crosslingual Argument Mining in the Medical Domain](https://arxiv.org/abs/2301.10527) - Code: [https://github.com/ragerri/abstrct-projections/tree/final](https://github.com/ragerri/abstrct-projections/tree/final) You can load the model as follows: ```python from transformers import AutoModelForSequenceClassification model = AutoModelForSequenceClassification.from_pretrained('HiTZ/mbert-argument-mining-es') ``` ## Citation ````bibtex @misc{yeginbergen2024crosslingual, title={Cross-lingual Argument Mining in the Medical Domain}, author={Anar Yeginbergen and Rodrigo Agerri}, year={2024}, eprint={2301.10527}, archivePrefix={arXiv}, primaryClass={cs.CL} } ````