Upload unigrams.txt

by porupski - opened Jul 14, 2024

base: refs/heads/main

←

from: refs/pr/3

Discussion Files changed

+175774

-0

Upload unigrams.txt982ceca3

porupski

Jul 14, 2024

The current unigrams.txt file is empty? For my master's thesis I made the kenLM model from scratch from the same ParlaSpeechHR-v1.0 dataset (JSONL file), and this is my resulting unigrams.txt that I found to work rather well.

nljubesi

CLASSLA - CLARIN Knowledge Centre for South Slavic Languages org Jul 15, 2024

You probably saw it, we now have the much larger ParlaSpeech-HR v2.0 available as well (https://huggingface.co./datasets/classla/ParlaSpeech-HR) if you have good use cases. @5roop will look into your request and will merge upon inspection, thanks!

I see you have similar interests as we do otherwise, would not mind we exchange insights and plans forward.

5roop changed pull request status to merged Aug 2, 2024

5roop

CLASSLA - CLARIN Knowledge Centre for South Slavic Languages org Aug 2, 2024

•

edited Aug 2, 2024

Thanks for your contribution, @porupski , I tested your unigrams on the two files we have in the repo, and the new version works OK. It would be good to check performance on a non-ParlaSpeech-HR dataset, but let's leave this for some later date.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment