How to make log-Mel spectrogram?

The eventual goal is to make a pure Julia Whisper inference, but got stopped before 1st step.

Most ingredients of the spectrogram is simply a combination of hanging window and stft:

in Julia, we can find them in DSP.jl(Periodograms - periodogram estimation · DSP.jl)

but where can I find the mel filterbank matrix? which is some audio specific scaling model

1 Like

To make a log-Mel spectrogram from a DSP spectrogram, I believe you need to transform the frequency axis according to the Mel-scale formula and display the amplitudes in dB.

Linking also this “related” thread.

You can also take a look of MFCC.jl , MusicProcessing.jl and SpeechFeatures.jl