ESVC Demo

ESVC: COMBINING ADAPTIVE STYLE FUSION AND MULTI-LEVEL FEATURE DISENTANGLEMENT FOR EXPRESSIVE SINGING VOICE CONVERSION

Abstract

Nowadays, singing voice conversion (SVC) has made great strides in both naturalness and similarity for common SVC with a neutral expression. However, besides singer identity, emotional expression is also essential to convey the singer's emotions and attitudes, but current SVC systems can not effectively support it. In this paper, we propose an expressive SVC framework called ESVC, which can convert singer identity and emotional style simultaneously. ESVC combines the ideas of style fusion and feature disentanglement, seeking to maximize fidelity in terms of emotional style and singer identity. Firstly, for style information penetration, we employ adaptive instance normalization (AdaIN) to fuse the content feature and style feature. Secondly, given the possibility of information leakage, two disentanglement-oriented methods are introduced to decouple different kinds of singing features. Mutual information (MI) is used to reduce the correlation between linguistic content, fundamental frequency (F0) and expressive feature, while adversarial triplet loss is exerted for decoupling identity and emotional elements. To the best of our knowledge, ESVC is the first SVC system to jointly convert singer identity and emotional style. Objective and subjective experiments demonstrate that our system significantly outperforms the state-of-the-art SVC model in terms of style expressiveness.

Model Architecture


Figure 1. Overall network architecture of system ESVC. Based on so-vits-svc, it combines AdaIN Resblocks, MI loss and adversarial triplet loss.

Expressive Singing Voice Conversion


Source Target Converted
Song 1: "只怕我自己会爱上你”
Neutral
Happy
Sad
Song 2: "当某天,你若听见”
Neutral
Happy
Sad
           Song 3: "我用尽一生一世来将你供养”
Neutral
Happy
Sad

Source Target so-vits-svc so-vits-svc-AdaIN so-vits-svc-AdaIN-MI ESVC (w/ Lemo and w/o Lsin) ESVC (w/ Lemo and Lsin)
Male (Happy)
Male (Sad)
Female (Happy)
Female (Sad)