Upload unigrams.txt

by porupski - opened Jul 14, 2024

base: refs/heads/main

←

from: refs/pr/3

Discussion Files changed

+175774

-0

Upload unigrams.txt982ceca3

porupski

CLASSLA - CLARIN Knowledge Centre for South Slavic Languages org Jul 14, 2024

The current unigrams.txt file is empty? For my master's thesis I made the kenLM model from scratch from the same ParlaSpeechHR-v1.0 dataset (JSONL file), and this is my resulting unigrams.txt that I found to work rather well.

nljubesi

CLASSLA - CLARIN Knowledge Centre for South Slavic Languages org Jul 15, 2024

You probably saw it, we now have the much larger ParlaSpeech-HR v2.0 available as well (https://huggingface.co/datasets/classla/ParlaSpeech-HR) if you have good use cases. @5roop will look into your request and will merge upon inspection, thanks!

I see you have similar interests as we do otherwise, would not mind we exchange insights and plans forward.

5roop changed pull request status to merged Aug 2, 2024

5roop

CLASSLA - CLARIN Knowledge Centre for South Slavic Languages org Aug 2, 2024

•

edited Aug 2, 2024

Thanks for your contribution, @porupski , I tested your unigrams on the two files we have in the repo, and the new version works OK. It would be good to check performance on a non-ParlaSpeech-HR dataset, but let's leave this for some later date.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment