XLM-ROBERTA-LARGE-VIEILLE-FRANCE

This is a fine tuned version of the 'FacebookAI/xlm-roberta-large' that was trained to identify names, locations and dates in texts in ancient french. (==> it is hoped that a cross lingual transfer will occur).

The model has been fine tuned using a corpus of hand annotated texts that have been made public by the university of Tours. Unfortunately, the curated dataset cannot be republished as a huggingface dataset. The fine tuning used a cased, as well as an uncased version of the corpus to perform the training.

Note

It is very slow, but it can nevertheless run on my laptop CPU.

Evaluation

On the 'test' split of our unpublished dataset, the classification report made by seqeval was as follows:

              precision    recall  f1-score   support

        DATE       0.99      1.00      0.99       492
         LOC       1.00      1.00      1.00      1004
        PERS       1.00      1.00      1.00       807

   micro avg       1.00      1.00      1.00      2303
   macro avg       1.00      1.00      1.00      2303
weighted avg       1.00      1.00      1.00      2303

Confusion matrix

image/png

Downloads last month
19
Safetensors
Model size
559M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.