metadata

language:
  - de
tags:
  - ColBERT
  - PyLate
  - sentence-transformers
  - sentence-similarity
pipeline_tag: sentence-similarity
library_name: PyLate
datasets:
  - samheym/ger-dpr-collection
base_model:
  - deepset/gbert-base

Model Overview

GerColBERT is a ColBERT-based retrieval model trained on German text. It is designed for efficient late interaction-based retrieval while maintaining high-quality ranking performance. Training Configuration

Base Model: deepset/gbert-base
Training Dataset: samheym/ger-dpr-collection
Dataset: 10% of randomly selected triples from the final dataset
Vector Length: 128
Maximum Document Length: 256 Tokens
Batch Size: 50
Training Steps: 80,000
Gradient Accumulation: 1 step
Learning Rate: 5 × 10⁻⁶
Optimizer: AdamW
In-Batch Negatives: Included

Usage

First install the PyLate library:

pip install -U pylate

Retrieval

PyLate provides a streamlined interface to index and retrieve documents using ColBERT models. The index leverages the Voyager HNSW index to efficiently handle document embeddings and enable fast retrieval.

from pylate import indexes, models, retrieve

# Step 1: Load the ColBERT model
model = models.ColBERT(
    model_name_or_path=samheym/GerColBERT,
)