Knet: Need Help Generating Minibatches for RNN input

themadprogramer · April 25, 2020, 12:30am

I’m fairly new to Knet and still trying to figure my way around. The project I’m working on right now is a simple “speech emotion recognizer”. Unfortunately, the RNN tutorial makes for a better introduction to NLP problems than signal processing so I wanted to ask a few questions after receiving

AssertionError: vec(value(x)) isa WTYPE one too many times.

Basic Idea

Input + A Bidirectional Layer + Dense Layer => Output

A nice example in Flux can be found here:

github.com

maetshju/gsoc2018/blob/master/speech-blstm/01-speech-blstm.jl

# 01-speech-blstm.jl
# 
# See Graves & Schmidhuber ([Graves, A., &
# Schmidhuber, J. (2005). Framewise phoneme classification with
# bidirectional LSTM and other neural network architectures. Neural
# Networks, 18(5-6), 602-610.]).

using Flux
using Flux: crossentropy, softmax, flip, sigmoid, LSTM
using BSON

# Paths to the training and test data directories
traindir = "train"
testdir = "test"

# Component layers of the bidirectional LSTM layer
forward = LSTM(26, 93)
backward = LSTM(26, 93)
output = Dense(186, 61)

This file has been truncated. show original

Input Format

The dataset I’m using is https://smartlaboratory.org/ravdess/.

I’m only using 13 mel-frequency cepstrum coefficients, for each sample I take in a given recording (which is usually between 300 to 600 samples per recording). I’m training on the Neutral (01) and Happy (03) emotions.

So that’s 13 features, over sequences of 300 to 600 in length, to learn 2 classes.

Approach So Far

#imports and config

using BSON

ENV["COLUMNS"] = 72
using Pkg; for p in ("Knet","IterTools","Plots"); haskey(Pkg.installed(),p) || Pkg.add(p); end
using Random: shuffle!
using Base.Iterators: flatten
using IterTools: ncycle, takenth
using Knet: Knet, AutoGrad, param, param0, mat, RNN, relu, Data, adam, progress, nll, zeroone

# Usual Chain Definition
struct Chain
    layers
    Chain(layers...) = new(layers)
end
(c::Chain)(x) = (for l in c.layers; x = l(x); end; x)
(c::Chain)(x,y) = nll(c(x),y)

# Usual Dense Layer Definition
struct Dense; w; b; f; end
Dense(i::Int,o::Int,f=identity) = Dense(param(o,i), param0(o), f)
(d::Dense)(x) = d.f.(d.w * mat(x,dims=1) .+ d.b)

...

#After loading in my dataset from another file where it has been preprocessed

println.(summary.((Xs, Ys)));

> 288-element Array{Any,1}
> 288-element Array{Any,1}

#I get the feeling there's something wrong here, below is what the first entry in Xs and Ys are:

#Features
Xs[1]
> 328-element Array{Array{Float64,1},1}:
> [-158.44758646562016, -14.786369432867609, ... ] 
> [-504.61429557613394, 16.563930341805474, ... ] ...

#Labels
Ys[1]
> 328-element Array{Array{Float64,1},1}:
> [1.0]
> [1.0]
> [1.0] ...

#For the sequence-batching, I tried following the tutorial, which admittedly not a very smart move

#Arbitrary, should probably be changed
BATCHSIZE = 32
SEQLENGTH = 16;

function seqbatch(x,y,B,T)
    N = length(x) ÷ B
    #println(N)
    x = permutedims(reshape(x[1:N*B],N,B))
    #println(x)
    y = permutedims(reshape(y[1:N*B],N,B))
    #println(y)
    
    d = []; for i in 0:T:N-T
        push!(d, (x[:,i+1:i+T], y[:,i+1:i+T]))
    end
    return d
end

allX = vcat((x->x[:,1]).(Xs)...)
allY = vcat((x->x[:,1]).(Ys)...);

d = seqbatch(allX, allY, BATCHSIZE, SEQLENGTH);

shuffle!(d)
dtst = d[1:10]
dtrn = d[11:end];

#Training Method

 function trainresults(file,maker,savemodel)
    model = maker()
    results = ((nll(model,dtst), zeroone(model,dtst))
               for x in takenth(progress(adam(model,ncycle(dtrn,5))),100))
    results = reshape(collect(Float32,flatten(results)),(2,:))
    Knet.save(file,"model",(savemodel ? model : nothing),"results",results)
    Knet.gc() # To save gpu memory
    println(minimum(results,dims=2))
    return model,results
end

BIRNN(input,hidden,output)=  # biRNN Tagger, Float64 instead of the default 32
Chain(RNN(input,hidden,rnnType=:relu,bidirectional=true,dataType=Float64),Dense(2hidden,output));

#I assume input corresponds to my feature count of 13, since I'm not trying anything funny with strides/dilations etc. and my output to 1 (though I'm worried I'm accidentally doing regression instead of classification here)

EMREC() = BIRNN(13,HIDDENSIZE,1)
(tEm,rEm) = trainresults("emrec.jld2",EMREC,true);

I can clearly see I have more than one issue going on here, but my real concern is what dimensionality my input should be. The tutorial uses an encoder for the words, but in my case that seems to be out of the question.

Am I feeding in the input properly? If so what’s the matter? Or is the issue strictly related to the typing of my minibatch?

Thanks in advance.

themadprogramer · April 27, 2020, 12:10pm

Guess I’m answering my own question if no one else has a better solution. I found that modifying the seqbatch in the tutorial to output a tensor of dimensions ( Features x B x T ) to work virtually identical to the tutorial’s method of outputting (B x T) tensors and using embedding to scale them to ( X x B x T ) dimensionality.

I should reword this better next time I edit it…

raoulg · August 6, 2020, 9:59pm

Hi, could you elaborate a bit more on this?
I’m having sort of the same question, see dimensions of minibatch

Topic		Replies	Views
Dimensions of minibatch Machine Learning	3	1061	August 7, 2020
LSTM training for a sequence of multiple features using a batch size 30 Machine Learning lstm	10	2468	November 22, 2023
Building simple sequence-to-one RNN with Flux New to Julia flux	8	2078	March 4, 2021
Batch time-series input for RNN Machine Learning array , time-series	1	601	April 17, 2021
What's wrong with my training data,RNN network work well on demo but bad on mine Machine Learning question , flux , rnn	1	107	June 8, 2024

Knet: Need Help Generating Minibatches for RNN input

Related topics