YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Sentence-BERT Quantized Model for Text Similarity & Paraphrase Detection
This repository hosts a quantized version of the Sentence-BERT (SBERT) model, fine-tuned on the Quora Question Pairs dataset for text similarity and paraphrase detection. The model computes semantic similarity between two input sentences and has been optimized for efficient deployment using ONNX quantization.
Model Details
- Model Architecture: Sentence-BERT (
all-MiniLM-L6-v2
) - Task: Text Similarity & Paraphrase Detection
- Dataset: Quora Question Pairs (QQP)
- Quantization: ONNX (Dynamic Quantization)
- Fine-tuning Framework: Sentence-Transformers (Hugging Face)
Usage
Installation
pip install sentence-transformers onnxruntime transformers
Loading the Model
Original Fine-tuned Model
from sentence_transformers import SentenceTransformer
# Load the fine-tuned model
model = SentenceTransformer("fine-tuned-model")
# Encode two sentences and compute cosine similarity
sentence1 = "How can I learn Python?"
sentence2 = "What is the best way to study Python?"
emb1 = model.encode(sentence1)
emb2 = model.encode(sentence2)
# Cosine similarity
import numpy as np
score = np.dot(emb1, emb2) / (np.linalg.norm(emb1) * np.linalg.norm(emb2))
print("Similarity Score:", score)
# Threshold to classify as paraphrase
print("Paraphrase" if score > 0.75 else "Not Paraphrase")
Quantized ONNX Model
from onnxruntime import InferenceSession
from transformers import AutoTokenizer
import numpy as np
# Load tokenizer and ONNX session
tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")
session = InferenceSession("sbert_onnx/model.onnx")
def encode_onnx(session, tokenizer, sentence):
inputs = tokenizer(sentence, return_tensors="np", padding=True, truncation=True)
outputs = session.run(None, dict(inputs))
return outputs[0][0]
# Encode and compute similarity
emb1 = encode_onnx(session, tokenizer, sentence1)
emb2 = encode_onnx(session, tokenizer, sentence2)
score = np.dot(emb1, emb2) / (np.linalg.norm(emb1) * np.linalg.norm(emb2))
print("Quantized Similarity Score:", score)
print("Paraphrase" if score > 0.75 else "Not Paraphrase")
Performance Metrics
- Accuracy: ~0.87
- F1 Score: ~0.85
- Threshold for classification: 0.75 cosine similarity
Fine-Tuning Details
Dataset
- Source: Quora Question Pairs (Kaggle)
- Size: 400K+ question pairs labeled as paraphrase or not
Training Configuration
- Epochs: 3
- Batch Size: 16
- Evaluation Steps: 1000
- Warmup Steps: 1000
- Loss Function: CosineSimilarityLoss
Quantization
- Method: ONNX dynamic quantization
- Tool: Hugging Face Optimum + ONNX Runtime
Repository Structure
.
βββ fine-tuned-model/ # Fine-tuned SBERT model directory
βββ sbert_onnx/ # Quantized ONNX model directory
βββ test_functions.py # Code for evaluation and testing
βββ README.md # Project documentation
Limitations
- The cosine similarity threshold (0.75) may need tuning for different domains.
- ONNX quantization may introduce slight performance degradation compared to full-precision models.
- SBERT embeddings do not produce classification logits, only similarity scores.
Contributing
Contributions are welcome! Please open an issue or submit a pull request for bug fixes or improvements.
- Downloads last month
- 4
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support