GerColBERT / README.md
samheym's picture
Update README.md
d84587b verified
---
language:
- de
tags:
- ColBERT
- PyLate
- sentence-transformers
- sentence-similarity
pipeline_tag: sentence-similarity
library_name: PyLate
datasets:
- samheym/ger-dpr-collection
base_model:
- deepset/gbert-base
---
# GerColBERT
This is a [PyLate](https://github.com/lightonai/pylate) model trained. It maps sentences & paragraphs to sequences of 128-dimensional dense vectors and can be used for semantic textual similarity using the MaxSim operator.
## Model Details
### Model Description
- **Model Type:** PyLate model
- **Base model:** [deepset/gbert-base](https://huggingface.co/deepset/gbert-base)
- **Document Length:** 180 tokens
- **Query Length:** 32 tokens
- **Output Dimensionality:** 128 tokens
- **Similarity Function:** MaxSim
- **Training Dataset:** samheym/ger-dpr-collection
- **Language:** de
<!-- - **License:** Unknown -->
## Usage
First install the PyLate library:
```bash
pip install -U pylate
```
### Retrieval
PyLate provides a streamlined interface to index and retrieve documents using ColBERT models. The index leverages the Voyager HNSW index to efficiently handle document embeddings and enable fast retrieval.
```python
from pylate import indexes, models, retrieve
# Step 1: Load the ColBERT model
model = models.ColBERT(
model_name_or_path=samheym/GerColBERT,
)
```
## Training Details
### Framework Versions
- Python: 3.12.3
- Sentence Transformers: 3.4.1
- PyLate: 1.1.4
- Transformers: 4.48.2
- PyTorch: 2.6.0+cu124
- Accelerate: 1.4.0
- Datasets: 2.21.0
- Tokenizers: 0.21.0
<!--
## Citation
### BibTeX
<!--
## Glossary
*Clearly define terms in order to be accessible across audiences.*
-->
<!--
## Model Card Authors
*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
-->
<!--
## Model Card Contact
*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
-->