Question regarding model evaluation

by marcoullmann - opened 17 days ago

17 days ago

Hello Nizar,

great work. Thank you very much!

Did the model see the test data during training time? If not, can you provide details about the split ratios?

Thank you. Best,
Marco

Owner 17 days ago

Hello Marco,

Thank you for your feedback!

No the model was trained only on the training splits given with the databases I gathered online. Here's a summary of the split ratios.

I stay available for any other question about the model :)

Best,
Nizar

Database	Total	Train	Test	Train_Percentage	Test_Percentage
ASGD_Test_Set	5750	0	5750	0%	100%
SDS-200_Corpus	138907	135271	3636	97%	3%
SPC	150702	147370	3332	98%	2%
STT4SG-350	247527	222922	24605	90%	10%
SwissDialZH1_1	30921	30921	0	100%	0%

16 days ago

Thank you! Hope I can use the model soon in a real world project.

Owner 16 days ago

You're welcome! That's cool, keep me updated :)
In any case here's what I did with the model for my master thesis:

Speeding up inference by translating the model with CTranslate2.
Added word-alignment using the model with WhisperX, from the ct2 translated model. (There are 1-2 tricks to make it run properly since v3 natively uses more log-mel channels, I can provide some code to to that if needed)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment