Asteroid model

Description:

  • Code: The code corresponding to this pretrained model can be found here.

  • Notebook: Colab Notebook with examples can be found here

  • Paper: "Multi-Decoder DPRNN: High Accuracy Source Counting and Separation", Junzhe Zhu, Raymond Yeh, Mark Hasegawa-Johnson. ICASSP(2021).

  • Summary: This model achieves SOTA on the problem of source separation with an unknown number of speakers. It uses multiple decoder heads(each tackling a distinct number of speakers), in addition to a classifier head that selects which decoder head to use.

  • Project Page

  • Original research repo

This model was trained by Joseph Zhu using the wsj0-mix-var/Multi-Decoder-DPRNN recipe in Asteroid. It was trained on the sep_count task of the Wsj0MixVar dataset.

Training config:

filterbank:
  n_filters: 64
  kernel_size: 8
  stride: 4
masknet:
  n_srcs: [2, 3, 4, 5]
  bn_chan: 128
  hid_size: 128
  chunk_size: 128
  hop_size: 64
  n_repeats: 8
  mask_act: 'sigmoid'
  bidirectional: true
  dropout: 0
  use_mulcat: false
training:
  epochs: 200
  batch_size: 2
  num_workers: 2
  half_lr: yes
  lr_decay: yes
  early_stop: yes
  gradient_clipping: 5
optim:
  optimizer: adam
  lr: 0.001
  weight_decay: 0.00000
data:
  train_dir: "data/{}speakers/wav8k/min/tr"
  valid_dir: "data/{}speakers/wav8k/min/cv"
  task: sep_count
  sample_rate: 8000
  seglen: 4.0
  minlen: 2.0
loss:
  lambda: 0.05

Results:

'Accuracy': 0.9723333333333334, 'P-Si-SNR': 10.36027378628496

License notice:

This work "MultiDecoderDPRNN" is a derivative of CSR-I (WSJ0) Complete by LDC, used under LDC User Agreement for Non-Members (Research only). "MultiDecoderDPRNN" is licensed under Attribution-ShareAlike 3.0 Unported by Joseph Zhu.

Downloads last month
64
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.