--- license: cc-by-sa-4.0 datasets: - procesaur/kisobran - procesaur/STARS - procesaur/Vikipedija - procesaur/Vikizvornik - jerteh/SrpELTeC language: - sr library_name: fasttext ---

FastText Sr

Обучаван над корпусом српског језика - 9.5 милијарди речи

Међу датотекама се налазе модели у Gensim, али и оригиналном формату

Trained on the Serbian language corpus - 9.5 billion words

The files include models in both Gensim and the original format.

```python from gensim.models import FastText model = Word2Vec.load("TeslaFT") examples = [ ("dim", "zavesa"), ("staklo", "zavesa"), ("ormar", "zavesa"), ("prozor", "zavesa"), ("draperija", "zavesa") ] for e in examples: model.wv.cosine_similarities(ft.wv[e[0]], ft.wv[[e[1]]])[0] ``` ``` 0.5305264 0.7095266 0.6041575 0.5771946 0.8870213 ``` ```python from gensim.models.fasttext import load_facebook_model model = load_facebook_model("TeslaFT.bin") examples = [ ("dim", "zavesa"), ("staklo", "zavesa"), ("ormar", "zavesa"), ("prozor", "zavesa"), ("draperija", "zavesa") ] for e in examples: model.wv.cosine_similarities(ft.wv[e[0]], ft.wv[[e[1]]])[0] ``` ``` 0.5305264 0.7095266 0.6041575 0.5771946 0.8870213 ```
Author
Mihailo Škorić
@procesaur
Computation
TESLA project
@te-sla


Истраживање jе спроведено уз подршку Фонда за науку Републике Србиjе, #7276, Text Embeddings – Serbian Language Applications – TESLA

This research was supported by the Science Fund of the Republic of Serbia, #7276, Text Embeddings - Serbian Language Applications - TESLA