Hello, i've had some problems using this model. Mainly because the sentence-transformers library is quite sensitive in terms of missing files. I've decided to add the missing files from the base model (https://huggingface.co/sentence-transformers/xlm-r-distilroberta-base-paraphrase-v1) and add the "clean_up_tokenization_spaces": true parameter to the tokenizer-config.json.

Those warnings were escalated to an error by the prompt benchmark tool I'm using, and I thought you could make use of this change and avoid future issues.

The Benchmark scores changed minimally:
Original DE:0.8549768717756436
Updated DE:0.8549777340634312 (slight increase)
Original EN:0.8660333530928567
Updated EN:0.8660334102061337 (slight increase)
Original Cross:0.8525445612883897
Updated Cross:0.8525444308395488 (slight decrease)

jimmymeister changed pull request status to open
Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment