Speech-based Emotion Recognition


I wasn’t able to find any package for Speech-based Mood/Emotion recognition to use in a project I’m working on, so I need some help setting up something myself.

I know that LSTM’s are in use for this, I’ve seen WaveNet adapted for almost every other audio problem at this point and I have even seen some of the slightly outdated Spatio-Temporal Box Filters.

I guess what I really want to ask is, what would be a good place to start from? Not necessarily the best, most accurate or even fastest approach; but the simplest to implement.

Thanks in advance!

There’s a recent article that may be a good start: https://www.assemblyai.com/blog/end-to-end-speech-recognition-pytorch
You have a port of pytorch in Julia:

Pytorch support of LSTM:

You have this GSOC project:

And some usefull packages here:

1 Like