|
--- |
|
language: |
|
- en |
|
pipeline_tag: sentence-similarity |
|
tags: |
|
- Pytorch |
|
- Sentence Transformers |
|
- Transformers |
|
license: "apache-2.0" |
|
--- |
|
|
|
# Twitter4SSE |
|
|
|
This model maps texts to 768 dimensional dense embeddings that encode semantic similarity. |
|
It was trained with Multiple Negatives Ranking Loss (MNRL) on a Twitter dataset. |
|
It was initialized from [BERTweet](https://huggingface.co/vinai/bertweet-base) and trained with [Sentence-transformers](https://www.sbert.net/). |
|
|
|
## Usage |
|
|
|
The model is easier to use with sentence-trainsformers library |
|
|
|
``` |
|
pip install -U sentence-transformers |
|
``` |
|
|
|
``` |
|
from sentence_transformers import SentenceTransformer |
|
sentences = ["This is the first tweet", "This is the second tweet"] |
|
|
|
model = SentenceTransformer('digio/Twitter4SSE') |
|
embeddings = model.encode(sentences) |
|
print(embeddings) |
|
``` |
|
|
|
|
|
Without sentence-transfomer library, please refer to [this repository](https://huggingface.co/sentence-transformers) for detailed instructions on how to use Sentence Transformers on Huggingface. |
|
|
|
## Citing & Authors |
|
|
|
The official paper [Exploiting Twitter as Source of Large Corpora of Weakly Similar Pairs for Semantic Sentence Embeddings](https://arxiv.org/abs/2110.02030) will be presented at EMNLP 2021. Further details will be available soon. |
|
|
|
``` |
|
@inproceedings{di-giovanni-brambilla-2021-exploiting, |
|
title = "Exploiting {T}witter as Source of Large Corpora of Weakly Similar Pairs for Semantic Sentence Embeddings", |
|
author = "Di Giovanni, Marco and |
|
Brambilla, Marco", |
|
booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing", |
|
month = nov, |
|
year = "2021", |
|
address = "Online and Punta Cana, Dominican Republic", |
|
publisher = "Association for Computational Linguistics", |
|
url = "https://aclanthology.org/2021.emnlp-main.780", |
|
pages = "9902--9910", |
|
} |
|
``` |
|
|
|
The official code is available on [GitHub](https://github.com/marco-digio/Twitter4SSE) |
|
|
|
|
|
|