File size: 578 Bytes
e3cb888
 
7b6e445
 
e3cb888
 
 
4429e70
1ceccc5
e3cb888
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
---
license: mit
language:
- ar
---

# akhooli/arabic-colbertv2-711k-norm
This is a ColBERT V2 model trained on [Arabic mMARCO dataset sample](https://huggingface.co/datasets/akhooli/ar-mmarco-sample) after removing queries with Latin words (711K queries). 
It is not fully trained (22000 steps only), but is good for many tasks especially ranking and information retrieval (semantic search). 
The dataset was normalized before training, so please normalize your query and docs before using it. 
```python
from unicodedata import normalize
query_n = normalize('NFKC', query)
```