Griffin-Lim Vocoder

2021-11-07

Word count: 157 | Reading time≈ 1 min

A traditional vocoder based on iterative algorithm.

Link: https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1164317

Goal

Reconstruct voice waveform from spectrum.

amplitude spectrum -> phase spectrum

Method

Initialize phase spectrum randomly
Synthesize voice waveform from amplitude spectrum and phase spectrum by ISTFT(Inverse Short Time Fourier Transform)
Apply STFT to new voice waveform, get new amplitude spectrum and phase spectrum.
Drop new amplitude spectrum, and goto 2.

phases = np.exp(2j * np.pi * np.random.rand(*S.shape))
S_complex = np.abs(S).astype(np.complex)
y = _istft(S_complex * phases, hparams)
for i in range(hparams.griffin_lim_iters):
    phases = np.exp(1j * np.angle(_stft(y, hparams)))
    y = _istft(S_complex * phases, hparams)

Why?

In the appendix of this paper, the difference between estimated value and true value decreases continuously in the iterative process.

But it is hard to understand the proving process for me right now. So maybe I will look back to it later.