akhooli's picture
Update README.md
7b6e445 verified
|
raw
history blame
515 Bytes
metadata
license: mit
language:
  - ar

akhooli/arabic-colbertv2-711k-norm

This is a ColBERT V2 model trained on Arabic mMARCO dataset sample after removing queries with Latin words (711K queries). It is not fully trained, but is good for many tasks especially ranking. The dataset was normalized before training, so please normalize your query and docs before using it.

from unicodedata import normalize
query_n = normalize('NFKC', query)