license: mit
datasets:
- dleemiller/wiki-sim
- sentence-transformers/stsb
language:
- en
metrics:
- spearmanr
- pearsonr
base_model:
- answerdotai/ModernBERT-large
pipeline_tag: text-classification
library_name: sentence-transformers
tags:
- cross-encoder
- modernbert
- sts
- stsb
- stsbenchmark-sts
model-index:
- name: CrossEncoder based on answerdotai/ModernBERT-large
results:
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: sts test
type: sts-test
metrics:
- type: pearson_cosine
value: 0.9256352639938148
name: Pearson Cosine
- type: spearman_cosine
value: 0.9214535713008775
name: Spearman Cosine
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: sts dev
type: sts-dev
metrics:
- type: pearson_cosine
value: 0.933041295532361
name: Pearson Cosine
- type: spearman_cosine
value: 0.9316328000924687
name: Spearman Cosine
ModernBERT Cross-Encoder: Semantic Similarity (STS)
Cross encoders are high performing encoder models that compare two texts and output a 0-1 score.
I've found the cross-encoders/roberta-large-stsb
model to be very useful in creating evaluators for LLM outputs.
They're simple to use, fast and very accurate.
Like many people, I was excited about the architecture and training uplift from the ModernBERT architecture (answerdotai/ModernBERT-large
).
So I've applied it to the stsb cross encoder, which is a very handy model. Additionally, I've added
pretraining from a much larger semi-synthetic dataset dleemiller/wiki-sim
that targets this kind of objective.
The inference performance efficiency, expanded context and simplicity make this a really nice platform as an evaluator model.
Features
- High performing: Achieves Pearson: 0.9256 and Spearman: 0.9215 on the STS-Benchmark test set.
- Efficient architecture: Based on the ModernBERT-large design (395M parameters), offering faster inference speeds.
- Extended context length: Processes sequences up to 8192 tokens, great for LLM output evals.
- Diversified training: Pretrained on
dleemiller/wiki-sim
and fine-tuned onsentence-transformers/stsb
.
Performance
Model | STS-B Test Pearson | STS-B Test Spearman | Context Length | Parameters | Speed |
---|---|---|---|---|---|
ModernCE-large-sts |
0.9256 | 0.9215 | 8192 | 395M | Medium |
ModernCE-base-sts |
0.9162 | 0.9122 | 8192 | 149M | Fast |
stsb-roberta-large |
0.9147 | - | 512 | 355M | Slow |
stsb-distilroberta-base |
0.8792 | - | 512 | 82M | Fast |
Usage
To use ModernCE for semantic similarity tasks, you can load the model with the Hugging Face sentence-transformers
library:
from sentence_transformers import CrossEncoder
# Load ModernCE model
model = CrossEncoder("dleemiller/ModernCE-large-sts")
# Predict similarity scores for sentence pairs
sentence_pairs = [
("It's a wonderful day outside.", "It's so sunny today!"),
("It's a wonderful day outside.", "He drove to work earlier."),
]
scores = model.predict(sentence_pairs)
print(scores) # Outputs: array([0.9184, 0.0123], dtype=float32)
Output
The model returns similarity scores in the range [0, 1]
, where higher scores indicate stronger semantic similarity.
Training Details
Pretraining
The model was pretrained on the pair-score-sampled
subset of the dleemiller/wiki-sim
dataset. This dataset provides diverse sentence pairs with semantic similarity scores, helping the model build a robust understanding of relationships between sentences.
- Classifier Dropout: a somewhat large classifier dropout of 0.3, to reduce overreliance on teacher scores.
- Objective: STS-B scores from
cross-encoder/stsb-roberta-large
.
Fine-Tuning
Fine-tuning was performed on the sentence-transformers/stsb
dataset.
Validation Results
The model achieved the following test set performance after fine-tuning:
- Pearson Correlation: 0.9256
- Spearman Correlation: 0.9215
Model Card
- Architecture: ModernBERT-large
- Tokenizer: Custom tokenizer trained with modern techniques for long-context handling.
- Pretraining Data:
dleemiller/wiki-sim (pair-score-sampled)
- Fine-Tuning Data:
sentence-transformers/stsb
Thank You
Thanks to the AnswerAI team for providing the ModernBERT models, and the Sentence Transformers team for their leadership in transformer encoder models.
Citation
If you use this model in your research, please cite:
@misc{moderncestsb2025,
author = {Miller, D. Lee},
title = {ModernCE STS: An STS cross encoder model},
year = {2025},
publisher = {Hugging Face Hub},
url = {https://huggingface.co/dleemiller/ModernCE-large-sts},
}
License
This model is licensed under the MIT License.