--- language: - de tags: - ColBERT - PyLate - sentence-transformers - sentence-similarity pipeline_tag: sentence-similarity library_name: PyLate datasets: - samheym/ger-dpr-collection base_model: - deepset/gbert-base --- # GerColBERT This is a [PyLate](https://github.com/lightonai/pylate) model trained. It maps sentences & paragraphs to sequences of 128-dimensional dense vectors and can be used for semantic textual similarity using the MaxSim operator. ## Model Details ### Model Description - **Model Type:** PyLate model - **Base model:** [deepset/gbert-base](https://huggingface.co/deepset/gbert-base) - **Document Length:** 180 tokens - **Query Length:** 32 tokens - **Output Dimensionality:** 128 tokens - **Similarity Function:** MaxSim - **Training Dataset:** samheym/ger-dpr-collection - **Language:** de ## Usage First install the PyLate library: ```bash pip install -U pylate ``` ### Retrieval PyLate provides a streamlined interface to index and retrieve documents using ColBERT models. The index leverages the Voyager HNSW index to efficiently handle document embeddings and enable fast retrieval. ```python from pylate import indexes, models, retrieve # Step 1: Load the ColBERT model model = models.ColBERT( model_name_or_path=samheym/GerColBERT, ) ``` ## Training Details ### Framework Versions - Python: 3.12.3 - Sentence Transformers: 3.4.1 - PyLate: 1.1.4 - Transformers: 4.48.2 - PyTorch: 2.6.0+cu124 - Accelerate: 1.4.0 - Datasets: 2.21.0 - Tokenizers: 0.21.0