Knet: Need Help Generating Minibatches for RNN input

I’m fairly new to Knet and still trying to figure my way around. The project I’m working on right now is a simple “speech emotion recognizer”. Unfortunately, the RNN tutorial makes for a better introduction to NLP problems than signal processing so I wanted to ask a few questions after receiving

AssertionError: vec(value(x)) isa WTYPE one too many times.

Basic Idea

Input + A Bidirectional Layer + Dense Layer => Output

A nice example in Flux can be found here:

Input Format

The dataset I’m using is https://smartlaboratory.org/ravdess/.

I’m only using 13 mel-frequency cepstrum coefficients, for each sample I take in a given recording (which is usually between 300 to 600 samples per recording). I’m training on the Neutral (01) and Happy (03) emotions.

So that’s 13 features, over sequences of 300 to 600 in length, to learn 2 classes.

Approach So Far

#imports and config

using BSON

ENV["COLUMNS"] = 72
using Pkg; for p in ("Knet","IterTools","Plots"); haskey(Pkg.installed(),p) || Pkg.add(p); end
using Random: shuffle!
using Base.Iterators: flatten
using IterTools: ncycle, takenth
using Knet: Knet, AutoGrad, param, param0, mat, RNN, relu, Data, adam, progress, nll, zeroone

# Usual Chain Definition
struct Chain
    layers
    Chain(layers...) = new(layers)
end
(c::Chain)(x) = (for l in c.layers; x = l(x); end; x)
(c::Chain)(x,y) = nll(c(x),y)

# Usual Dense Layer Definition
struct Dense; w; b; f; end
Dense(i::Int,o::Int,f=identity) = Dense(param(o,i), param0(o), f)
(d::Dense)(x) = d.f.(d.w * mat(x,dims=1) .+ d.b)

...

#After loading in my dataset from another file where it has been preprocessed

println.(summary.((Xs, Ys)));

> 288-element Array{Any,1}
> 288-element Array{Any,1}

#I get the feeling there's something wrong here, below is what the first entry in Xs and Ys are:

#Features
Xs[1]
> 328-element Array{Array{Float64,1},1}:
> [-158.44758646562016, -14.786369432867609, ... ] 
> [-504.61429557613394, 16.563930341805474, ... ] ...

#Labels
Ys[1]
> 328-element Array{Array{Float64,1},1}:
> [1.0]
> [1.0]
> [1.0] ...

#For the sequence-batching, I tried following the tutorial, which admittedly not a very smart move

#Arbitrary, should probably be changed
BATCHSIZE = 32
SEQLENGTH = 16;

function seqbatch(x,y,B,T)
    N = length(x) ÷ B
    #println(N)
    x = permutedims(reshape(x[1:N*B],N,B))
    #println(x)
    y = permutedims(reshape(y[1:N*B],N,B))
    #println(y)
    
    d = []; for i in 0:T:N-T
        push!(d, (x[:,i+1:i+T], y[:,i+1:i+T]))
    end
    return d
end

allX = vcat((x->x[:,1]).(Xs)...)
allY = vcat((x->x[:,1]).(Ys)...);

d = seqbatch(allX, allY, BATCHSIZE, SEQLENGTH);

shuffle!(d)
dtst = d[1:10]
dtrn = d[11:end];

#Training Method

 function trainresults(file,maker,savemodel)
    model = maker()
    results = ((nll(model,dtst), zeroone(model,dtst))
               for x in takenth(progress(adam(model,ncycle(dtrn,5))),100))
    results = reshape(collect(Float32,flatten(results)),(2,:))
    Knet.save(file,"model",(savemodel ? model : nothing),"results",results)
    Knet.gc() # To save gpu memory
    println(minimum(results,dims=2))
    return model,results
end

BIRNN(input,hidden,output)=  # biRNN Tagger, Float64 instead of the default 32
Chain(RNN(input,hidden,rnnType=:relu,bidirectional=true,dataType=Float64),Dense(2hidden,output));

#I assume input corresponds to my feature count of 13, since I'm not trying anything funny with strides/dilations etc. and my output to 1 (though I'm worried I'm accidentally doing regression instead of classification here)

EMREC() = BIRNN(13,HIDDENSIZE,1)
(tEm,rEm) = trainresults("emrec.jld2",EMREC,true);

I can clearly see I have more than one issue going on here, but my real concern is what dimensionality my input should be. The tutorial uses an encoder for the words, but in my case that seems to be out of the question.

Am I feeding in the input properly? If so what’s the matter? Or is the issue strictly related to the typing of my minibatch?

Thanks in advance.

Guess I’m answering my own question if no one else has a better solution. I found that modifying the seqbatch in the tutorial to output a tensor of dimensions ( Features x B x T ) to work virtually identical to the tutorial’s method of outputting (B x T) tensors and using embedding to scale them to ( X x B x T ) dimensionality.

I should reword this better next time I edit it…

1 Like

Hi, could you elaborate a bit more on this?
I’m having sort of the same question, see dimensions of minibatch