File size: 883 Bytes
b6745a8 10b8f09 13574be 25af56c 639a3a0 cceb698 25af56c 639a3a0 10b8f09 25af56c 13574be 10b8f09 d200a92 10b8f09 89f6039 d200a92 10b8f09 6c4eb4e 10b8f09 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
---
datasets:
- armvectores/hy_wikipedia_2023
pipeline_tag: feature-extraction
language:
- hy
library_name: fasttext
---
414M tokens
1) 73M hy wikipedia
2) 341M arlis database
74951 unique words
3-5 ngrams
5 window length
300 embedding dim
skipgram
minimum number of words 150
100 epochs, 0.05 start lr
26 hours on 20 xeon gold cores
How to use
1) Install fastText
```
pip install fasttext-wheel
```
2) Import fastText in python
```
import fasttext
from huggingface_hub import hf_hub_download
model_path = hf_hub_download(local_dir=".",
repo_id="armvectores/wikipedia_arlis_tokens_fasttextskipgram_300_5",
filename="model.bin")
model = fasttext.load_model(model_path)
```
3) Examples of usage
```
word = 'զենքեր'
print(model.get_nearest_neighbors(word))
print(model.get_sentence_vector(word))
``` |