---
license: cc-by-sa-4.0
language:
- en
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
- topic-relatedness
- semantic-relatedness
base_model:
- sentence-transformers/paraphrase-multilingual-mpnet-base-v2
datasets:
- FrancescoPeriti/TRoTR
---

# TRoTR-all-distilroberta-v1

<!-- Provide a quick summary of what the model is/does. -->
```FrancescoPeriti/TRoTR-paraphrase-multilingual-mpnet-base-v2``` is a fine-tuned version of the ```sentence-transformers/paraphrase-multilingual-mpnet-base-v2```. 

**NOTE**: In our work, we performed cross-validation across 10 different folds. 
For a given model (e.g., ```multi-qa-mpnet-base-cos-v1```), this process involved fine-tuning 10 separate models and reporting the average performance across the test folds. 
Rather than sharing all the fine-tuned models for each fold, we decided to provide only an example model for the [**FOLD1**](https://github.com/FrancescoPeriti/TRoTR/tree/main/TRoTR/datasets/FOLD_1).
Please note that the results in the paper are based on the averaged performance across all folds. 
Therefore, the performance of this single model is not directly comparable to the results reported in the paper.

You can find more details in our paper [TRoTR: A Framework for Evaluating the Recontextualization of Text](https://aclanthology.org/2024.emnlp-main.774.pdf) by Francesco Periti, Pierluigi Cassotti, Stefano Montanelli, Nina Tahmasebi, and Dominik Schlechtweg.
The repository of our project is [https://github.com/FrancescoPeriti/TRoTR](https://github.com/FrancescoPeriti/TRoTR).

### Model Description
This model is designed to evaluate the topic relatedness of text reuse in different contexts.

The model is fine-tuned on the **TRoTR** dataset for _text recontextualization_ using _contrastive learning_. 
Specifically, given a target text-reuse excerpt 𝑡 within two contexts 𝑐₁ and 𝑐₂, 
the model is trained to minimize the embedding distance between 𝑐₁ and 𝑐₂ if they share the same topic, 
and to maximize the distance if they don't share the same topic.

As an example, consider three recontextualizations of the biblical passage ```John 15:13```:
- (1) It’s the wonderful pride month!! ❤️🧡💛💚💙💜 Honestly pride is everyday! Love is love don’t forget I love you ❤️. Remember this! John 15:12-13: 
“My command is this: Love each other as I have loved you. ```Greater love has no one than this: to lay down one’s life for one’s friends```” 
- (2) At a large Crimean event today Putin quoted the Bible to defend the special military operation in Ukraine which has killed thousands and displaced millions. His
words “```There is no greater love than if someone gives soul for their friends```”. And people were cheering him. Madness!!!
- (3) “Freeing people from genocide is the reason, motive & goal of the military operation we started in the Donbas& Ukraine”, Putin says, then quotes the Bible: “```There
is no greater love than to lay down one’s life for one’s friends.```” It’s like Billy Graham meets North Korea

In this example, the biblical passage is incorporated within three texts with different topic recontextualizations. In particular, the text (1) has a different
topic with respect to text (2) and (3), while the texts (2) and (3) are topic related

## How to Get Started with the Model
```python
from sentence_transformers import SentenceTransformer

# Load the model
model = SentenceTransformer('FrancescoPeriti/TRoTR-paraphrase-multilingual-mpnet-base-v2')

# Example sentences for text recontextualization
context1 = "It's the wonderful pride month!! ❤️🧡💛💚💙💜 Honestly pride is everyday! Love is love don't forget I love you ❤️. Remember this! John 15:12-13: My command is this: Love each other as I have loved you. Greater love has no one than this: to lay down one's life for one's friends"
context2 = "At a large Crimean event today Putin quoted the Bible to defend the special military operation in Ukraine which has killed thousands and displaced millions. His words \"Greater love has no one than this: to lay down one's life for one's friends\". And people were cheering him. Madness!!!"
context3 = "\"Freeing people from genocide is the reason, motive and goal of the military operation we started in the Donbas and Ukraine\", Putin says, then quotes the Bible: \"Greater love has no one than this: to lay down one's life for one's friends\" It's like Billy Graham meets North Korea."

# Encode the two contexts into embeddings
embedding1 = model.encode([context1])
embedding2 = model.encode([context2])
embedding3 = model.encode([context3])

# Calculate similarity
similarity1 = model.similarity(embedding1, embedding2)
similarity2 = model.similarity(embedding1, embedding3)
similarity3 = model.similarity(embedding2, embedding3)

# Print the similarity score
print(f"Cosine similarities between the contexts: {similarity1}, {similarity2}, {similarity3}")
# Cosine similarities between the contexts:  tensor([[0.2952]]), tensor([[0.3269]]), tensor([[0.8744]])
```

## Citation

Francesco Periti, Pierluigi Cassotti, Stefano Montanelli, Nina Tahmasebi, and Dominik Schlechtweg. 2024. [TRoTR: A Framework for Evaluating the Re-contextualization of Text Reuse](https://aclanthology.org/2024.emnlp-main.774/). In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 13972–13990, Miami, Florida, USA. Association for Computational Linguistics.

**BibTeX:**
```
@inproceedings{periti2024trotr,
    title = {{TRoTR: A Framework for Evaluating the Re-contextualization of Text Reuse}},
    author = "Periti, Francesco  and Cassotti, Pierluigi  and Montanelli, Stefano  and Tahmasebi, Nina  and Schlechtweg, Dominik",
    editor = "Al-Onaizan, Yaser  and Bansal, Mohit  and Chen, Yun-Nung",
    booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.emnlp-main.774",
    pages = "13972--13990",
    abstract = "Current approaches for detecting text reuse do not focus on recontextualization, i.e., how the new context(s) of a reused text differs from its original context(s). In this paper, we propose a novel framework called TRoTR that relies on the notion of topic relatedness for evaluating the diachronic change of context in which text is reused. TRoTR includes two NLP tasks: TRiC and TRaC. TRiC is designed to evaluate the topic relatedness between a pair of recontextualizations. TRaC is designed to evaluate the overall topic variation within a set of recontextualizations. We also provide a curated TRoTR benchmark of biblical text reuse, human-annotated with topic relatedness. The benchmark exhibits an inter-annotator agreement of .811. We evaluate multiple, established SBERT models on the TRoTR tasks and find that they exhibit greater sensitivity to textual similarity than topic relatedness. Our experiments show that fine-tuning these models can mitigate such a kind of sensitivity.",
}
```