---
language:
- de
tags:
- ColBERT
- PyLate
- sentence-transformers
- sentence-similarity
pipeline_tag: sentence-similarity
library_name: PyLate
datasets:
- samheym/ger-dpr-collection
base_model:
- deepset/gbert-base
---

# GerColBERT

This is a [PyLate](https://github.com/lightonai/pylate) model trained. It maps sentences & paragraphs to sequences of 128-dimensional dense vectors and can be used for semantic textual similarity using the MaxSim operator.

## Model Details

### Model Description
- **Model Type:** PyLate model
- **Base model:** [deepset/gbert-base](https://huggingface.co/deepset/gbert-base)
- **Document Length:** 180 tokens
- **Query Length:** 32 tokens
- **Output Dimensionality:** 128 tokens
- **Similarity Function:** MaxSim
- **Training Dataset:** samheym/ger-dpr-collection
- **Language:** de
<!-- - **License:** Unknown -->


## Usage
First install the PyLate library:

```bash
pip install -U pylate
```

### Retrieval 

PyLate provides a streamlined interface to index and retrieve documents using ColBERT models. The index leverages the Voyager HNSW index to efficiently handle document embeddings and enable fast retrieval.

```python
from pylate import indexes, models, retrieve

# Step 1: Load the ColBERT model
model = models.ColBERT(
    model_name_or_path=samheym/GerColBERT,
)
```


## Training Details

### Framework Versions
- Python: 3.12.3
- Sentence Transformers: 3.4.1
- PyLate: 1.1.4
- Transformers: 4.48.2
- PyTorch: 2.6.0+cu124
- Accelerate: 1.4.0
- Datasets: 2.21.0
- Tokenizers: 0.21.0

<!--
## Citation

### BibTeX

<!--
## Glossary

*Clearly define terms in order to be accessible across audiences.*
-->

<!--
## Model Card Authors

*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
-->

<!--
## Model Card Contact

*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
-->