jling
April 26, 2023, 4:08am
1
The eventual goal is to make a pure Julia Whisper inference, but got stopped before 1st step.
Most ingredients of the spectrogram is simply a combination of hanging window and stft:
window = torch.hann_window(N_FFT).to(audio.device)
stft = torch.stft(audio, N_FFT, HOP_LENGTH, window=window, return_complex=True)
magnitudes = stft[..., :-1].abs() ** 2
filters = mel_filters(audio.device, n_mels)
in Julia, we can find them in DSP.jl(Periodograms - periodogram estimation · DSP.jl )
but where can I find the mel filterbank matrix? which is some audio specific scaling model
1 Like
To make a log-Mel spectrogram from a DSP spectrogram, I believe you need to transform the frequency axis according to the Mel-scale formula and display the amplitudes in dB.
Linking also this “related” thread .