GerColBERT / README.md
samheym's picture
Update README.md
d84587b verified
metadata
language:
  - de
tags:
  - ColBERT
  - PyLate
  - sentence-transformers
  - sentence-similarity
pipeline_tag: sentence-similarity
library_name: PyLate
datasets:
  - samheym/ger-dpr-collection
base_model:
  - deepset/gbert-base

GerColBERT

This is a PyLate model trained. It maps sentences & paragraphs to sequences of 128-dimensional dense vectors and can be used for semantic textual similarity using the MaxSim operator.

Model Details

Model Description

  • Model Type: PyLate model
  • Base model: deepset/gbert-base
  • Document Length: 180 tokens
  • Query Length: 32 tokens
  • Output Dimensionality: 128 tokens
  • Similarity Function: MaxSim
  • Training Dataset: samheym/ger-dpr-collection
  • Language: de

Usage

First install the PyLate library:

pip install -U pylate

Retrieval

PyLate provides a streamlined interface to index and retrieve documents using ColBERT models. The index leverages the Voyager HNSW index to efficiently handle document embeddings and enable fast retrieval.

from pylate import indexes, models, retrieve

# Step 1: Load the ColBERT model
model = models.ColBERT(
    model_name_or_path=samheym/GerColBERT,
)

Training Details

Framework Versions

  • Python: 3.12.3
  • Sentence Transformers: 3.4.1
  • PyLate: 1.1.4
  • Transformers: 4.48.2
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.4.0
  • Datasets: 2.21.0
  • Tokenizers: 0.21.0