digio
/

Twitter4SSE

Sentence Similarity

feature-extraction

Sentence Transformers

Inference Endpoints

Model card Files Files and versions Community

Twitter4SSE / README.md

digio's picture

Update BibTex citation

91e2009 about 3 years ago

|

history blame contribute delete

1.97 kB

	---
	language:
	- en
	pipeline_tag: sentence-similarity
	tags:
	- Pytorch
	- Sentence Transformers
	- Transformers
	license: "apache-2.0"
	---

	# Twitter4SSE

	This model maps texts to 768 dimensional dense embeddings that encode semantic similarity.
	It was trained with Multiple Negatives Ranking Loss (MNRL) on a Twitter dataset.
	It was initialized from [BERTweet](https://huggingface.co/vinai/bertweet-base) and trained with [Sentence-transformers](https://www.sbert.net/).

	## Usage

	The model is easier to use with sentence-trainsformers library

	```
	pip install -U sentence-transformers
	```

	```
	from sentence_transformers import SentenceTransformer
	sentences = ["This is the first tweet", "This is the second tweet"]

	model = SentenceTransformer('digio/Twitter4SSE')
	embeddings = model.encode(sentences)
	print(embeddings)
	```


	Without sentence-transfomer library, please refer to [this repository](https://huggingface.co/sentence-transformers) for detailed instructions on how to use Sentence Transformers on Huggingface.

	## Citing & Authors

	The official paper [Exploiting Twitter as Source of Large Corpora of Weakly Similar Pairs for Semantic Sentence Embeddings](https://arxiv.org/abs/2110.02030) will be presented at EMNLP 2021. Further details will be available soon.

	```
	@inproceedings{di-giovanni-brambilla-2021-exploiting,
	title = "Exploiting {T}witter as Source of Large Corpora of Weakly Similar Pairs for Semantic Sentence Embeddings",
	author = "Di Giovanni, Marco and
	Brambilla, Marco",
	booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
	month = nov,
	year = "2021",
	address = "Online and Punta Cana, Dominican Republic",
	publisher = "Association for Computational Linguistics",
	url = "https://aclanthology.org/2021.emnlp-main.780",
	pages = "9902--9910",
	}
	```

	The official code is available on [GitHub](https://github.com/marco-digio/Twitter4SSE)