Porting a RNN model to Flux from PyTorch

Hi All,

I’m trying to port this example of a recurrent neural network in PyTorch to Flux to help me learn the API. I know that I’m not putting the data together with the loss function in the right way (I’m using the char-rnn model from the model zoo as a guide), but I was wondering whether anyone would chip in to see where I’m going wrong. Apart from only going through a single train of a minibatch, I’m trying to stay as faithful to the original implementation as possible. The code below gives

MethodError: no method matching isless(::TrackedArray{…,Array{Float64,2}}, ::Array{Float64,2}) Closest candidates are: isless(!Matched::Missing, ::Any) at missing.jl:66 isless(::Any, !Matched::Missing) at missing.jl:67

using Flux
using Flux: chunk, batchseq, onehot, onehotbatch, mse
using StatsBase: sample, wsample
# using CuArrays

# Make simulated sequence
bases = ['A','C','G','T']
alphabet = [bases;'_']

seq_len = 220
seq = [sample(bases) for i in 1:seq_len]
seq = join(seq)

function sim_error(seq,pins=0.05,pdel=0.05,psub=0.01)
    out_seq = []
    for c in seq
        while true
            r=rand()
            if r < pins
                push!(out_seq,sample(bases))
            else
                break
            end
        end
        r -= pins
        if r < pdel
            continue
        end
        r -= pdel
        if r < psub
            push!(out_seq,sample(bases))
            continue
        end
        push!(out_seq,c)
    end
    return join(out_seq)
end

num_sim = 20
seqs = [sim_error(seq) for i in 1:num_sim]
max_len = maximum([length(s) for s in seqs])

# Generate one-hot
input_t = [onehotbatch(input[1:(end-1)],bases) for input in seqs]
output_t = [onehotbatch(input[2:end],bases) for input in seqs]

# Define model
hidden_dim = 32
layer1_dim = 12
layer2_dim = 12
num_bases = 4
m = Chain(
  LSTM(num_bases, hidden_dim),
  Dense(hidden_dim,layer1_dim),
  relu,
  Dense(layer1_dim,layer2_dim),
  relu,
  Dense(layer2_dim,num_bases)
  )

# Define MSE loss
function loss(xs, ys)
    l = sum(Flux.mse.(m.(xs)), ys)
    Flux.truncate!(m)
    return l
end

# Set optimiser
lr = 0.1
opt = SGD(params(m), lr)

# Train one minibatch
mini_batch_size = 5
idx = [sample(1:num_sim) for x in 1:mini_batch_size]
train = [(input_t[i], output_t[i]) for i in idx]
Flux.train!(loss, train, opt)

Hello,
I believe you should replace relu with x -> relu.(x) in your model, this is because you’re passing a matrix to relu while it expects a number/vector.
You can easily debug a Chain-model by executing parts of the model (e.g. m1:2 doesn’t throw an error).
However, when running your model with this change, I get MethodError: no method matching mse(::TrackedArray{…,Array{Float64,2}}), I believe this is because the dimensions of xs and ys don’t match up.

Jules

Thanks! I didn’t use x->relu.(x) as this seems to be taken care of in the layer description: the char-rnn example in the model zoo just has the standard softmax rather than softmax.. Making the change as you describe:

m = Chain(
  LSTM(num_bases, hidden_dim),
  Dense(hidden_dim,layer1_dim),
  x->relu.(x),
  Dense(layer1_dim,layer2_dim),
  x->relu.(x),
  Dense(layer2_dim,num_bases)
  )

does indeed fix the problem, although I’m still puzzled why this isn’t needed in the char-rnn example.

PS. The error you get was due to a silly typo in my loss function:

function loss(xs, ys)
    l = sum(Flux.mse.(m.(xs), ys))
    Flux.truncate!(m)
    return l
end

Dennse(hidden_dim,layer1_dim,relu) is simpler right and doesn’t need the dot

1 Like

Don’t take my word for it, but I believe softmax is seen as a layer (just like Dense, Conv…) while relu is an activation function (like sigmoid, elu…), as @xiaodai noted, it’s simpler to just pass the activation function as an argument of your layer.

Thanks! I guess that considering softmax as a layer makes sense given that it operates on a vector.

I’m still having problems porting the code - the below starts to run, but then gives me a NaN loss.

using Flux
using Flux: chunk, batchseq, onehot, onehotbatch, onecold, mse, crossentropy, throttle, @epochs
using Random
using StatsBase: sample, wsample
# using CuArrays

# Make simulated sequence
bases = ['A','C','G','T']
alphabet = [bases;'_']
Random.seed!(1234)

seq_len = 220
seq = [sample(bases) for i in 1:seq_len]
seq = join(seq)

function sim_error(seq,pins=0.05,pdel=0.05,psub=0.01)
    out_seq = []
    for c in seq
        while true
            r=rand()
            if r < pins
                push!(out_seq,sample(bases))
            else
                break
            end
        end
        r -= pins
        if r < pdel
            continue
        end
        r -= pdel
        if r < psub
            push!(out_seq,sample(bases))
            continue
        end
        push!(out_seq,c)
    end
    return join(out_seq)
end

num_sim = 20
seqs = [sim_error(seq,0.0,0.0,0.01) for i in 1:num_sim]
max_len = maximum([length(s) for s in seqs])

# Generate one-hot
input_t = [onehotbatch(input[1:(end-1)],bases) for input in seqs]
output_t = [onehotbatch(input[2:end],bases) for input in seqs]

# Define MSE loss
function loss(xs, ys)
    l = sum(Flux.mse.(m.(xs), ys))
    Flux.truncate!(m)
    return l
end

# Define model
hidden_dim = 32
layer1_dim = 12
layer2_dim = 12
num_bases = 4
m = Chain(
  LSTM(num_bases, hidden_dim),
  Dense(hidden_dim,layer1_dim,relu),
  Dense(layer1_dim,layer2_dim,relu),
  Dense(layer2_dim,num_bases)
  )
# Set optimiser
lr = 0.1
opt = SGD(params(m), lr)

# Train one minibatch
mini_batch_size = 20
#idx = [sample(1:num_sim) for x in 1:mini_batch_size]
idx = 1:num_sim
train = [(input_t[i], output_t[i]) for i in idx]
#test = [(input_t[i], output_t[i]) for i in idx]
test_x = [input_t[i] for i in idx]
test_y = [output_t[i] for i in idx]
evalcb = throttle(() -> @show(loss(test_x[1],test_y[1])), 10)
@epochs 10 Flux.train!(loss, train, opt, cb=evalcb)