SALT

Abstract
Demos - Anonymization
Demos - Accent Preservation

1. Abstract

Speaker anonymization aims to conceal a speaker's identity without degrading speech quality and intelligibility. Most speaker anonymization systems disentangle the speaker representation from the original speech and achieve anonymization by averaging or modifying the speaker representation. However, the anonymized speech is subject to reduction in pseudo speaker distinctiveness, speech quality and intelligibility for out-of-distribution speaker. To solve this issue, we propose SALT, a Speaker Anonymization system based on Latent space Transformation. Specifically, we extract latent features by a self-supervised feature extractor and randomly sample multiple speakers and their weights, and then interpolate the latent vectors to achieve speaker anonymization. Meanwhile, we explore the extrapolation method to further extend the diversity of pseudo speakers. Experiments on Voice Privacy Challenge dataset show our system achieves a state-of-the-art distinctiveness metric while preserving speech quality and intelligibility.

2. Demos -- Anonymization

Corresponding to Section 4 in our paper, below lists the anonymized samples. We compare our proposed method (SALT) with the VoicePrivacy Challenge offical basline system (B1.a) and the top rank system (NWPU-ASLP).

Our proposed model is denoted as [B or L]-Sx-Px, where B or L means WavLM-Base or WavLM-Large encoder, Sx means the scale factor s = x, Px means the preservation factor p = x.

3. Demos -- Accent Preservation

Corresponding to Section 4.1 in our paper, below lists the ASR results of some speech clips with obvious accents.

We notice that the recognition result of Whisper Large is usually better than U2++ in recognizing accented speech. We hypothesis that this is due to the accent robustness of the Whisper ASR model.

The text marked with red colour is the word with a strong accent.

SALT: Distinguishable Speaker Anonymization Through Latent Space Transformation

Yuanjun Lv¹ ², Jixun Yao¹ ², Peikun Chen¹, Hongbin Zhou², Heng Lu², Lei Xie¹

¹Audio, Speech and Language Processing Group (ASLP@NPU)
School of Computer Science, Northwestern Polytechnical University, Xi'an, China

²Ximalaya Inc., China

Contents

1. Abstract

2. Demos -- Anonymization

3. Demos -- Accent Preservation

SALT: Distinguishable Speaker Anonymization Through Latent Space Transformation

Yuanjun Lv¹ ², Jixun Yao¹ ², Peikun Chen¹, Hongbin Zhou², Heng Lu², Lei Xie¹

¹Audio, Speech and Language Processing Group (ASLP@NPU)School of Computer Science, Northwestern Polytechnical University, Xi'an, China

²Ximalaya Inc., China

Contents

1. Abstract

2. Demos -- Anonymization

3. Demos -- Accent Preservation

¹Audio, Speech and Language Processing Group (ASLP@NPU)
School of Computer Science, Northwestern Polytechnical University, Xi'an, China