I wasn’t able to find any package for Speech-based Mood/Emotion recognition to use in a project I’m working on, so I need some help setting up something myself.
I know that LSTM’s are in use for this, I’ve seen WaveNet adapted for almost every other audio problem at this point and I have even seen some of the slightly outdated Spatio-Temporal Box Filters.
I guess what I really want to ask is, what would be a good place to start from? Not necessarily the best, most accurate or even fastest approach; but the simplest to implement.