GerColBERT

This is a PyLate model trained. It maps sentences & paragraphs to sequences of 128-dimensional dense vectors and can be used for semantic textual similarity using the MaxSim operator.

Model Details

Model Description

  • Model Type: PyLate model
  • Base model: deepset/gbert-base
  • Document Length: 180 tokens
  • Query Length: 32 tokens
  • Output Dimensionality: 128 tokens
  • Similarity Function: MaxSim
  • Training Dataset: samheym/ger-dpr-collection
  • Language: de

Usage

First install the PyLate library:

pip install -U pylate

Retrieval

PyLate provides a streamlined interface to index and retrieve documents using ColBERT models. The index leverages the Voyager HNSW index to efficiently handle document embeddings and enable fast retrieval.

from pylate import indexes, models, retrieve

# Step 1: Load the ColBERT model
model = models.ColBERT(
    model_name_or_path=samheym/GerColBERT,
)

Training Details

Framework Versions

  • Python: 3.12.3
  • Sentence Transformers: 3.4.1
  • PyLate: 1.1.4
  • Transformers: 4.48.2
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.4.0
  • Datasets: 2.21.0
  • Tokenizers: 0.21.0
Downloads last month
65
Safetensors
Model size
110M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support sentence-similarity models for PyLate library.

Model tree for samheym/GerColBERT

Base model

deepset/gbert-base
Finetuned
(51)
this model

Dataset used to train samheym/GerColBERT