arxiv:2112.03752

Danna-Sep: Unite to separate them all

Published on Dec 7, 2021

Authors:

Abstract

Deep learning-based music source separation has gained a lot of interest in the last decades. Most of the existing methods operate with either spectrograms or waveforms. Spectrogram based models learn suitable masks for separating magnitude spectrogram into different sources, and waveform-based <PRE_TAG>models</POST_TAG> directly generate waveforms of individual sources. The two types of models have complementary strengths; the former is superior given harmonic sources such as vocals, while the latter demonstrates better results for percussion and bass instruments. In this work, we improved upon the state-of-the-art (SoTA) models and successfully combined the best of both worlds. The backbones of the proposed framework, dubbed Danna-Sep, are two spectrogram-based <PRE_TAG>models</POST_TAG> including a modified X-UMX and U-Net, and an enhanced Demucs as the waveform-based model. Given an input of mixture, we linearly combined respective outputs from the three models to obtain the final result. We showed in the experiments that, despite its simplicity, Danna-Sep surpassed the SoTA models by a large margin in terms of Source-to-Distortion Ratio.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2112.03752 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2112.03752 in a dataset README.md to link it from this page.

Spaces citing this paper 1

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.