te-sla/Word2VecSr · Hugging Face

Word2Vec Sr

Обучаван над корпусом српског језика - 9.5 милијарди речи

Међу датотекама се налазе два модела (CBOW и SkipGram варијанте)

Trained on the Serbian language corpus - 9.5 billion words

There are two models among the files (CBOW and SkipGram variants)

from gensim.models import Word2Vec
model = Word2Vec.load("TeslaW2Vleme")
examples = [
    ("dim", "zavesa"),
    ("staklo", "zavesa"),
    ("ormar", "zavesa"),
    ("prozor", "zavesa"),
    ("draperija", "zavesa")
]
for e in examples:
    model.wv.similarity(e[0], e[1]))

Author

Mihailo Škorić

@procesaur

Computation

TESLA project

@te-sla

@inproceedings{stankovic-dict2vec,
  author    = {Ranka Stanković, Jovana Rađenović, Mihailo Škorić, Marko Putniković},
  title     = {Learning Word Embeddings using Lexical Resources and Corpora},
  booktitle   = {15th International Conference on Information Society and Technology, ISIST 2025, Kopaonik},
  year      = {2025},
  address = {Kopaonik, Belgrade}
  publisher = {SASA, Belgrade},
  url       = {https://doi.org/10.5281/zenodo.15093900}
}

Истраживање jе спроведено уз подршку Фонда за науку Републике Србиjе, #7276, Text Embeddings – Serbian Language Applications – TESLA

This research was supported by the Science Fund of the Republic of Serbia, #7276, Text Embeddings - Serbian Language Applications - TESLA

te-sla
/

Word2VecSr

Word2Vec Sr

Model tree for te-sla/Word2VecSr

Datasets used to train te-sla/Word2VecSr

Collection including te-sla/Word2VecSr

Statička vektorizacija