German BERT large paraphrase cosine

This is a sentence-transformers model. It maps sentences & paragraphs (text) into a 1024 dimensional dense vector space. The model is intended to be used together with SetFit to improve German few-shot text classification. It has a sibling model called deutsche-telekom/gbert-large-paraphrase-euclidean.

This model is based on deepset/gbert-large. Many thanks to deepset!

Loss Function
We have used MultipleNegativesRankingLoss with cosine similarity as the loss function.

Training Data
The model is trained on a carefully filtered dataset of deutsche-telekom/ger-backtrans-paraphrase. We deleted the following pairs of sentences:

  • min_char_len less than 15
  • jaccard_similarity greater than 0.3
  • de_token_count greater than 30
  • en_de_token_count greater than 30
  • cos_sim less than 0.85

Hyperparameters

  • learning_rate: 8.345726930229726e-06
  • num_epochs: 7
  • train_batch_size: 57
  • num_gpu: 1

Evaluation Results

We use the NLU Few-shot Benchmark - English and German dataset to evaluate this model in a German few-shot scenario.

Qualitative results

Licensing

Copyright (c) 2023 Philip May, Deutsche Telekom AG
Copyright (c) 2022 deepset GmbH

Licensed under the MIT License (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License by reviewing the file LICENSE in the repository.

Downloads last month
23,232
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for deutsche-telekom/gbert-large-paraphrase-cosine

Finetuned
(13)
this model

Dataset used to train deutsche-telekom/gbert-large-paraphrase-cosine