akhooli/arabic-colbertv2-711k-norm

This is a ColBERT V2 model trained on Arabic mMARCO dataset sample after removing queries with Latin words (711K queries). It is not fully trained (22000 steps only), but is good for many tasks especially ranking and information retrieval (semantic search). The dataset was normalized before training, so please normalize your query and docs before using it.

from unicodedata import normalize
query_n = normalize('NFKC', query)
Downloads last month
167
Safetensors
Model size
135M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.