metadata
language:
- de
tags:
- ColBERT
- PyLate
- sentence-transformers
- sentence-similarity
pipeline_tag: sentence-similarity
library_name: PyLate
datasets:
- samheym/ger-dpr-collection
base_model:
- deepset/gbert-base
GerColBERT
This is a PyLate model trained. It maps sentences & paragraphs to sequences of 128-dimensional dense vectors and can be used for semantic textual similarity using the MaxSim operator.
Model Details
Model Description
- Model Type: PyLate model
- Base model: deepset/gbert-base
- Document Length: 180 tokens
- Query Length: 32 tokens
- Output Dimensionality: 128 tokens
- Similarity Function: MaxSim
- Training Dataset: samheym/ger-dpr-collection
- Language: de
Usage
First install the PyLate library:
pip install -U pylate
Retrieval
PyLate provides a streamlined interface to index and retrieve documents using ColBERT models. The index leverages the Voyager HNSW index to efficiently handle document embeddings and enable fast retrieval.
from pylate import indexes, models, retrieve
# Step 1: Load the ColBERT model
model = models.ColBERT(
model_name_or_path=samheym/GerColBERT,
)
Training Details
Framework Versions
- Python: 3.12.3
- Sentence Transformers: 3.4.1
- PyLate: 1.1.4
- Transformers: 4.48.2
- PyTorch: 2.6.0+cu124
- Accelerate: 1.4.0
- Datasets: 2.21.0
- Tokenizers: 0.21.0