tflite-hub
/

conformer-speaker-encoder

speaker-recognition

speaker-embedding

speaker-verification

speaker-identification

speaker-encoder

Model card Files Files and versions Community

conformer-speaker-encoder / README.md

wq2012's picture

Update README.md

703fa02 verified 4 months ago

|

history blame contribute delete

2.48 kB

	---
	license: apache-2.0
	tags:
	- audio
	- speech
	- speaker
	- speaker-recognition
	- speaker-embedding
	- speaker-verification
	- speaker-identification
	- speaker-encoder
	- tflite
	- voice
	library_name: sidlingvo
	---

	# Conformer based multilingual speaker encoder

	## Summary

	This is a massively multilingual conformer-based speaker recognition model.

	The model was trained with public data only, using the GE2E loss.

	Papers:

	* Multilingual: https://arxiv.org/abs/2104.02125
	* GE2E loss: https://arxiv.org/abs/1710.10467

	```
	@inproceedings{chojnacka2021speakerstew,
	title={{SpeakerStew: Scaling to many languages with a triaged multilingual text-dependent and text-independent speaker verification system}},
	author={Chojnacka, Roza and Pelecanos, Jason and Wang, Quan and Moreno, Ignacio Lopez},
	booktitle={Prod. Interspeech},
	pages={1064--1068},
	year={2021},
	doi={10.21437/Interspeech.2021-646},
	issn={2958-1796},
	}

	@inproceedings{wan2018generalized,
	title={Generalized end-to-end loss for speaker verification},
	author={Wan, Li and Wang, Quan and Papir, Alan and Moreno, Ignacio Lopez},
	booktitle={International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
	pages={4879--4883},
	year={2018},
	organization={IEEE}
	}
	```

	## Usage

	Run use this model, you will need to use the `siglingvo` library: https://github.com/google/speaker-id/tree/master/lingvo

	Since lingvo does not support Python 3.11 yet, make sure your Python is up to 3.10.

	Install the library:

	```
	pip install sidlingvo
	```

	Example usage:

	```Python
	import os
	from sidlingvo import wav_to_dvector
	from huggingface_hub import hf_hub_download

	repo_id = "tflite-hub/conformer-speaker-encoder"
	model_path = "models"
	hf_hub_download(repo_id=repo_id, filename="vad_long_model.tflite", local_dir=model_path)
	hf_hub_download(repo_id=repo_id, filename="vad_long_mean_stddev.csv", local_dir=model_path)
	hf_hub_download(repo_id=repo_id, filename="conformer_tisid_medium.tflite", local_dir=model_path)

	enroll_wav_files = ["your_first_wav_file.wav"]
	test_wav_file = "your_second_wav_file.wav"
	runner = wav_to_dvector.WavToDvectorRunner(
	vad_model_file=os.path.join(model_path, "vad_long_model.tflite"),
	vad_mean_stddev_file=os.path.join(model_path, "vad_long_mean_stddev.csv"),
	tisid_model_file=os.path.join(model_path, "conformer_tisid_medium.tflite"))
	score = runner.compute_score(enroll_wav_files, test_wav_file)
	print("Speaker similarity score:", score)
	```