samheym commited on
Commit
9b148d4
·
verified ·
1 Parent(s): d84587b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -23
README.md CHANGED
@@ -14,22 +14,24 @@ base_model:
14
  - deepset/gbert-base
15
  ---
16
 
17
- # GerColBERT
18
 
19
- This is a [PyLate](https://github.com/lightonai/pylate) model trained. It maps sentences & paragraphs to sequences of 128-dimensional dense vectors and can be used for semantic textual similarity using the MaxSim operator.
 
 
 
 
 
 
 
 
 
 
 
 
 
20
 
21
- ## Model Details
22
 
23
- ### Model Description
24
- - **Model Type:** PyLate model
25
- - **Base model:** [deepset/gbert-base](https://huggingface.co/deepset/gbert-base)
26
- - **Document Length:** 180 tokens
27
- - **Query Length:** 32 tokens
28
- - **Output Dimensionality:** 128 tokens
29
- - **Similarity Function:** MaxSim
30
- - **Training Dataset:** samheym/ger-dpr-collection
31
- - **Language:** de
32
- <!-- - **License:** Unknown -->
33
 
34
 
35
 
@@ -55,17 +57,7 @@ model = models.ColBERT(
55
 
56
 
57
 
58
- ## Training Details
59
 
60
- ### Framework Versions
61
- - Python: 3.12.3
62
- - Sentence Transformers: 3.4.1
63
- - PyLate: 1.1.4
64
- - Transformers: 4.48.2
65
- - PyTorch: 2.6.0+cu124
66
- - Accelerate: 1.4.0
67
- - Datasets: 2.21.0
68
- - Tokenizers: 0.21.0
69
 
70
  <!--
71
  ## Citation
 
14
  - deepset/gbert-base
15
  ---
16
 
17
+ # Model Overview
18
 
19
+ GerColBERT is a ColBERT-based retrieval model trained on German text. It is designed for efficient late interaction-based retrieval while maintaining high-quality ranking performance.
20
+ Training Configuration
21
+
22
+ - Base Model: [deepset/gbert-base](https://huggingface.co/deepset/gbert-base)
23
+ - Training Dataset: samheym/ger-dpr-collection
24
+ - Dataset: 10% of randomly selected triples from the final dataset
25
+ - Vector Length: 128
26
+ - Maximum Document Length: 256 characters
27
+ - Batch Size: 50
28
+ - Training Steps: 80,000
29
+ - Gradient Accumulation: 1 step
30
+ - Learning Rate: 5 × 10⁻⁶
31
+ - Optimizer: AdamW
32
+ - In-Batch Negatives: Included
33
 
 
34
 
 
 
 
 
 
 
 
 
 
 
35
 
36
 
37
 
 
57
 
58
 
59
 
 
60
 
 
 
 
 
 
 
 
 
 
61
 
62
  <!--
63
  ## Citation