Knowledge Distillation from Self-Supervised Representation Learning Model with Discrete Speech Units for Any-to-Any Streaming Voice Conversion

If you are having trouble listening to the audios, try refreshing the page.

Contents

Audio samples

seen-to-seen conversion

Source speaker's utterance (female1_utterance1)

Target speaker GT VITS FreeVC TeacherContent KdOffline(ours) KdStream(ours)
Female1
Female2
Male1
Male2

Source speaker's utterance (female2_utterance1)

Target speaker GT VITS FreeVC TeacherContent KdOffline(ours) KdStream(ours)
Female1
Female2
Male1
Male2

Source speaker's utterance (male1_utterance1)

Target speaker GT VITS FreeVC TeacherContent KdOffline(ours) KdStream(ours)
Female1
Female2
Male1
Male2

Source speaker's utterance (male2_utterance1)

Target speaker GT VITS FreeVC TeacherContent KdOffline(ours) KdStream(ours)
Female1
Female2
Male1
Male2

Source speaker's utterance (female1_utterance2)

Target speaker GT VITS FreeVC TeacherContent KdOffline(ours) KdStream(ours)
Female1
Female2
Male1
Male2

Source speaker's utterance (female2_utterance2)

Target speaker GT VITS FreeVC TeacherContent KdOffline(ours) KdStream(ours)
Female1
Female2
Male1
Male2

Source speaker's utterance (male1_utterance2)

Target speaker GT VITS FreeVC TeacherContent KdOffline(ours) KdStream(ours)
Female1
Female2
Male1
Male2

Source speaker's utterance (male2_utterance2)

Target speaker GT VITS FreeVC TeacherContent KdOffline(ours) KdStream(ours)
Female1
Female2
Male1
Male2

seen-to-unseen conversion

Source speaker's utterance (female1_utterance1)

Target speaker GT VITS FreeVC TeacherContent KdOffline(ours) KdStream(ours)
Female1
Female2
Male1
Male2

Source speaker's utterance (female2_utterance1)

Target speaker GT VITS FreeVC TeacherContent KdOffline(ours) KdStream(ours)
Female1
Female2
Male1
Male2

Source speaker's utterance (male1_utterance1)

Target speaker GT VITS FreeVC TeacherContent KdOffline(ours) KdStream(ours)
Female1
Female2
Male1
Male2

Source speaker's utterance (male2_utterance1)

Target speaker GT VITS FreeVC TeacherContent KdOffline(ours) KdStream(ours)
Female1
Female2
Male1
Male2

Source speaker's utterance (female1_utterance2)

Target speaker GT VITS FreeVC TeacherContent KdOffline(ours) KdStream(ours)
Female1
Female2
Male1
Male2

Source speaker's utterance (female2_utterance2)

Target speaker GT VITS FreeVC TeacherContent KdOffline(ours) KdStream(ours)
Female1
Female2
Male1
Male2

Source speaker's utterance (male1_utterance2)

Target speaker GT VITS FreeVC TeacherContent KdOffline(ours) KdStream(ours)
Female1
Female2
Male1
Male2

Source speaker's utterance (male2_utterance2)

Target speaker GT VITS FreeVC TeacherContent KdOffline(ours) KdStream(ours)
Female1
Female2
Male1
Male2

unseen-to-seen conversion

Source speaker's utterance (female1_utterance1)

Target speaker GT VITS FreeVC TeacherContent KdOffline(ours) KdStream(ours)
Female1
Female2
Male1
Male2

Source speaker's utterance (female2_utterance1)

Target speaker GT VITS FreeVC TeacherContent KdOffline(ours) KdStream(ours)
Female1
Female2
Male1
Male2

Source speaker's utterance (male1_utterance1)

Target speaker GT VITS FreeVC TeacherContent KdOffline(ours) KdStream(ours)
Female1
Female2
Male1
Male2

Source speaker's utterance (male2_utterance1)

Target speaker GT VITS FreeVC TeacherContent KdOffline(ours) KdStream(ours)
Female1
Female2
Male1
Male2

Source speaker's utterance (female1_utterance2)

Target speaker GT VITS FreeVC TeacherContent KdOffline(ours) KdStream(ours)
Female1
Female2
Male1
Male2

Source speaker's utterance (female2_utterance2)

Target speaker GT VITS FreeVC TeacherContent KdOffline(ours) KdStream(ours)
Female1
Female2
Male1
Male2

Source speaker's utterance (male1_utterance2)

Target speaker GT VITS FreeVC TeacherContent KdOffline(ours) KdStream(ours)
Female1
Female2
Male1
Male2

Source speaker's utterance (male2_utterance2)

Target speaker GT VITS FreeVC TeacherContent KdOffline(ours) KdStream(ours)
Female1
Female2
Male1
Male2

unseen-to-unseen conversion

Source speaker's utterance (female1_utterance1)

Target speaker GT VITS FreeVC TeacherContent KdOffline(ours) KdStream(ours)
Female1
Female2
Male1
Male2

Source speaker's utterance (female2_utterance1)

Target speaker GT VITS FreeVC TeacherContent KdOffline(ours) KdStream(ours)
Female1
Female2
Male1
Male2

Source speaker's utterance (male1_utterance1)

Target speaker GT VITS FreeVC TeacherContent KdOffline(ours) KdStream(ours)
Female1
Female2
Male1
Male2

Source speaker's utterance (male2_utterance1)

Target speaker GT VITS FreeVC TeacherContent KdOffline(ours) KdStream(ours)
Female1
Female2
Male1
Male2

Source speaker's utterance (female1_utterance2)

Target speaker GT VITS FreeVC TeacherContent KdOffline(ours) KdStream(ours)
Female1
Female2
Male1
Male2

Source speaker's utterance (female2_utterance2)

Target speaker GT VITS FreeVC TeacherContent KdOffline(ours) KdStream(ours)
Female1
Female2
Male1
Male2

Source speaker's utterance (male1_utterance2)

Target speaker GT VITS FreeVC TeacherContent KdOffline(ours) KdStream(ours)
Female1
Female2
Male1
Male2

Source speaker's utterance (male2_utterance2)

Target speaker GT VITS FreeVC TeacherContent KdOffline(ours) KdStream(ours)
Female1
Female2
Male1
Male2