BERTić-COMtext-SR-legal-lemma-ekavica

BERTić-COMtext-SR-legal-lemma-ekavica is a variant of the BERTić model, fine-tuned on the task of lemmatization tag prediction in Serbian legal texts written in the Ekavian pronunciation. The model was fine-tuned for 20 epochs on the Ekavian variant of the COMtext.SR.legal dataset.

Benchmarking

This model was evaluated on the task of lemmatizing Serbian legal texts. Lemmatization was performed using the predicted string edit tags, as described in this JTDH 2024 paper:

The model was compared to previous lemmatization approaches that relied on the srLex inflectional lexicon:

Accuracy was used as the evaluation metric and gold tokenized text was taken as input. All of the previous large language models were fine-tuned for 15 epochs. CLASSLA and BERTić-SETimes were directly tested on the entire COMtext.SR.legal.ekavica corpus. BERTić-COMtext-SR-legal-MSD-ekavica, BERTić-COMtext-SR-legal-lemma-ekavica, and SrBERTa were fine-tuned and evaluated on the COMtext.SR.legal.ekavica corpus using 10-fold CV.

The code and data to run these experiments is available on the COMtext.SR GitHub repository.

Results

Model Lemma ACC
CLASSLA-SR 0.9432
BERTić-SETimes 0.9649
BERTić-COMtext-SR-legal-MSD-ekavica 0.9666
SrBERTa 0.9391
BERTić-COMtext-SR-legal-lemma-ekavica 0.9850
Downloads last month
2
Safetensors
Model size
110M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for ICEF-NLP/bcms-bertic-comtext-sr-legal-lemma-ekavica

Finetuned
(5)
this model