Model Description
Model utilizes Wav2vec2 architecture trained on the Superb dataset for keyword spotting task and was fine tuned to identify dental dental click utterance (https://en.wikipedia.org/wiki/Dental_click) in speech. Model was trained for 10 epochs on a limited quantity of speech (~1.5 hours) and with only one speaker. Thus the model should not be assumed to hold generalizability to other speakers or languages without further training data or rigorous testing.
Model was evaluated for accuracy on a hold out test set of 20% of the available data and scored 97%.
Uses
Model can be used via transformers library or via Hugging Face Hosted inference API to the right. I would caution against the use of the 'Record from browser' option as model may erronously identify user's mouse click as a speech utterance. Audio files for upload should be 1 sec in length, with 'WAV' format and 16 bit signed integer PCM encoding.
- Downloads last month
- 160