Flux newbie: simple Markov

Note: Flux newbie here.

I am trying to learn Flux with the goal in the end to experiment with different GRU-like models,
oneHot to oneHot with crossentropy, background is NLP.

I started learning Flux by trying to set up the simplest possible Markov chain, h0 is the prediction of the first observation, seq_length = 10. So

A) input = x,
B) ht = \hat{x},
C) correct output = x.

My excercise: estimate h0 and Wxh (and check with the simple non-NN estmate). Since I am not very intelligent I failed at this simple exercise, because RNN supposes there is a direct connection from an input signal to a the output. And if I try to write some code of my own and wrap in Recurrent() it complains about initialstates() not implemented and then signature wrong when I try to implement it. Sooo …

  1. Is there any more comprehensive introduction with examples anywhere?

  2. I can see that the source code in recurrent.jl is actually readable and with lots of comments. Is a better plan to take a couple of days and reverse engineer that file to figure out what is going on?

  3. What is a good way to think about a prediction model? I mean I could take the input x[1:end-1] and the output x[2:end] but that is a little bit silly since I do want a prediction of the first time point, both in my exercise and in coming complex models.

So here it is. Problem: it refuses to compute the gradient.

I have a 100 long test batch of sequences, 10 steps each. It is actually a simple Markov chain but I
just want to check my understanding before going full custom LSTM.

I follow the gist of the introductory text on recurrent networks, so xo is a vector of 10 onehot matrices of size 3 x 100 so the time steps in the sequence can be computed in parallell.

struct MyRecurrentModel{H,C}
    h0::H
    W::C
end

Flux.@layer MyRecurrentModel trainable=(h0, W)

function MyRecurrentModel(ncats::Int)
    return MyRecurrentModel(
        zeros(Float32, ncats), 
        zeros(Float32, ncats, ncats)
    )
end

function (m::MyRecurrentModel)(xo)
    y = []
    ht = repeat(m.h0, 1, size(xo[1], 2))
    for xt in xo
        y = [y; [ht]] # this prediction
        ht = m.W * xt # next prediction (meaningless for last one, haha)
    end
    return y
end

The model computes fine. The loss function computes fine.

using Optimisers: AdamW

model = MyRecurrentModel(catcount)

function loss(model, xo)
    Ε· = model(xo)
    return Flux.logitcrossentropy(hcat(Ε·...), hcat(xo...))
end

# initialize the model and optimizer
model = MyRecurrentModel(catcount, catcount)
opt_state = Flux.setup(AdamW(1e-3), model)

But the gradient refuses.

# compute the gradient and update the model
g = gradient(m -> loss(m, xo), model) #[1]

I guess Zygote does not like something. Can anybody help me?

MethodError: no method matching (::ChainRulesCore.ProjectTo{AbstractArray, ...

Can you post the whole error? And perhaps how to define catcount, xo?

Thanks for looking at it. Here is the complete source code.

A simple Markov model

 P = [
     0.8 0.1 0.2;
     0.1 0.7 0.2;
     0.1 0.2 0.6
 ]

3Γ—3 Matrix{Float64}:
0.8 0.1 0.2
0.1 0.7 0.2
0.1 0.2 0.6

Helper function to sample from categorical distribution

function sample(outcomes, weights)
    cumsum_weights = cumsum(weights)
    r = rand()
    for i in eachindex(outcomes)
        if r <= cumsum_weights[i]
            return outcomes[i]
        end
    end
    return outcomes[end]
end

Simulate

seqlen = 10
obscnt = 100
catcount = 3
catrange = Int16(1):Int16(catcount)
x = zeros(Int16, seqlen * obscnt)
x[1] = 1
for t in 2:length(x)
    x[t] = sample(1:3, P[:, x[t - 1]])
end
x = permutedims(reshape(x, seqlen, obscnt))

100Γ—10 Matrix{Int16}:
1 1 1 1 1 1 1 1 1 1
1 1 3 2 2 2 2 2 1 1
[…]

Simple ML-estimation

P0 = zeros(Float64, 3)
P = zeros(Float64, 3, 3)
for j in 1:obscnt
    P0[x[j, 1]] += 1
    for i in 2:seqlen
        P[x[j, i], x[j, i - 1]] += 1
    end
end
P ./ sum(P, dims=1)
3Γ—3 Matrix{Float64}:
 0.809399   0.0918728  0.179487
 0.0861619  0.70318    0.209402
 0.104439   0.204947   0.611111

An RNN model should give the same result

using Flux
using OneHotArrays
catcount = 3
catrange = Int16(1):Int16(catcount)
xoh = [Flux.onehotbatch(x[:, 1], catrange) for i = 1:seqlen]
xoh[1]
3Γ—100 OneHotMatrix(::Vector{UInt32}) with eltype Bool:
 1  1  1  β‹…  1  β‹…  β‹…  β‹…  β‹…  1  1  β‹…  1  …  β‹…  β‹…  β‹…  1  β‹…  1  β‹…  1  1  β‹…  β‹…  β‹…
 β‹…  β‹…  β‹…  1  β‹…  β‹…  β‹…  1  1  β‹…  β‹…  β‹…  β‹…     1  1  1  β‹…  1  β‹…  1  β‹…  β‹…  1  β‹…  β‹…
 β‹…  β‹…  β‹…  β‹…  β‹…  1  1  β‹…  β‹…  β‹…  β‹…  1  β‹…     β‹…  β‹…  β‹…  β‹…  β‹…  β‹…  β‹…  β‹…  β‹…  β‹…  1  1
struct MyRNN{H,C}
    h0::H
    W::C
end
​
Flux.@layer MyRNN trainable=(h0, W)
​
function MyRNN(nc::Int)
    return MyRNN(
        zeros(Float32, nc), 
        zeros(Float32, nc, nc)
    )
end
​
function (m::MyRNN)(xoh)
    y = []
    # repeat() does not work with gradient()
    # ht = repeat(m.h0, 1, size(xoh[1], 2))
    ht = m.h0 * ones(1, size(xoh[1], 2)) 
    for xt in xoh # Loop over seqlen, all sequences in parallell
        y = [y; [ht]] # this prediction
        ht = m.W * xt # next prediction - last one unused :-)
    end
    return y
end
model = MyRNN(catcount)
Ε· = model(xoh)
size(Ε·), size(Ε·[1]), size(Ε·[2]), model

((10,), (3, 100), (3, 100), MyRNN{Vector{Float32}, Matrix{Float32}}(Float32[0.0, 0.0, 0.0], Float32[0.0 0.0 0.0; 0.0 0.0 0.0; 0.0 0.0 0.0]))

using Optimisers: AdamW
​
function loss(model, xoh)
    Ε· = model(xoh)
    return Flux.logitcrossentropy(hcat(Ε· ...), hcat(xoh ...))
end
​
opt_state = Flux.setup(AdamW(1e-3), model)
g = gradient(m -> loss(m, xoh), model)[1]
#Flux.update!(opt_state, model, g)

MethodError: no method matching (::ChainRulesCore.ProjectTo{AbstractArray, @NamedTuple{elements::Vector{ChainRulesCore.ProjectTo{AbstractArray, @NamedTuple{element::ChainRulesCore.ProjectTo{Float32, @NamedTuple{}}, axes::Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}}}, axes::Tuple{Base.OneTo{Int64}}}})(::Tuple{ChainRulesCore.InplaceableThunk{ChainRulesCore.Thunk{ChainRules.var"#rrule##90#rrule##91"{ChainRulesCore.ProjectTo{AbstractArray, @NamedTuple{element::ChainRulesCore.ProjectTo{Float32, @NamedTuple{}}, axes::Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}}, Tuple{Colon, UnitRange{Int64}}, Matrix{Float64}}}, ChainRules.var"#rrule##88#rrule##89"{Tuple{Colon, UnitRange{Int64}}, Matrix{Float64}}}})
The object of type ChainRulesCore.ProjectTo{AbstractArray, @NamedTuple{elements::Vector{ChainRulesCore.ProjectTo{AbstractArray, @NamedTuple{element::ChainRulesCore.ProjectTo{Float32, @NamedTuple{}}, axes::Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}}}, axes::Tuple{Base.OneTo{Int64}}}} exists, but no method is defined for this combination of argument types when trying to treat it as a callable object.

Here’s the full error from the above:

julia> loss(model, xoh)
1.0986122886681082

julia> g = gradient(m -> loss(m, xoh), model)[1]
ERROR: MethodError: no method matching (::ChainRulesCore.ProjectTo{AbstractArray, @NamedTuple{…}})(::Tuple{ChainRulesCore.InplaceableThunk{…}})
The object of type `ChainRulesCore.ProjectTo{AbstractArray, @NamedTuple{elements::Vector{ChainRulesCore.ProjectTo{AbstractArray, @NamedTuple{element::ChainRulesCore.ProjectTo{Float32, @NamedTuple{}}, axes::Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}}}, axes::Tuple{Base.OneTo{Int64}}}}` exists, but no method is defined for this combination of argument types when trying to treat it as a callable object.

Closest candidates are:
  (::ChainRulesCore.ProjectTo{T})(::ChainRulesCore.NotImplemented) where T
   @ ChainRulesCore ~/.julia/packages/ChainRulesCore/Vsbj9/src/projection.jl:121
  (::ChainRulesCore.ProjectTo)(::ChainRulesCore.Thunk)
   @ ChainRulesCore ~/.julia/packages/ChainRulesCore/Vsbj9/src/projection.jl:124
  (::ChainRulesCore.ProjectTo)(::ChainRulesCore.InplaceableThunk)
   @ ChainRulesCore ~/.julia/packages/ChainRulesCore/Vsbj9/src/projection.jl:125
  ...

Stacktrace:
  [1] (::ChainRules.var"#487#492"{ChainRulesCore.ProjectTo{…}, Tuple{…}, ChainRulesCore.Tangent{…}})()
    @ ChainRules ~/.julia/packages/ChainRules/14CDN/src/rulesets/Base/array.jl:314
  [2] unthunk
    @ ~/.julia/packages/ChainRulesCore/Vsbj9/src/tangent_types/thunks.jl:213 [inlined]
  [3] unthunk(x::ChainRulesCore.InplaceableThunk{ChainRulesCore.Thunk{…}, ChainRules.var"#486#491"{…}})
    @ ChainRulesCore ~/.julia/packages/ChainRulesCore/Vsbj9/src/tangent_types/thunks.jl:252
  [4] getindex(a::ChainRulesCore.InplaceableThunk{ChainRulesCore.Thunk{…}, ChainRules.var"#486#491"{…}}, args::Int64)
    @ ChainRulesCore ~/.julia/packages/ChainRulesCore/Vsbj9/src/tangent_types/thunks.jl:64
  [5] (::ChainRules.var"#444#446"{ChainRulesCore.InplaceableThunk{…}, Tuple{…}})(n::Int64)
    @ ChainRules ~/.julia/packages/ChainRules/14CDN/src/rulesets/Base/array.jl:53
  [6] ntuple
    @ ./ntuple.jl:19 [inlined]
  [7] (::ChainRules.var"#vect_pullback#445"{…})(Θ³::ChainRulesCore.InplaceableThunk{…})
    @ ChainRules ~/.julia/packages/ChainRules/14CDN/src/rulesets/Base/array.jl:53
  [8] (::Zygote.ZBack{…})(dy::ChainRulesCore.InplaceableThunk{…})
    @ Zygote ~/.julia/packages/Zygote/55SqB/src/compiler/chainrules.jl:222
  [9] MyRNN
    @ ./REPL[39]:16 [inlined]
 [10] (::Zygote.Pullback{Tuple{…}, Any})(Ξ”::Tuple{ChainRulesCore.InplaceableThunk{…}, Vararg{…}})
    @ Zygote ~/.julia/packages/Zygote/55SqB/src/compiler/interface2.jl:0
 [11] loss
    @ ./REPL[49]:2 [inlined]
 [12] (::Zygote.Pullback{Tuple{…}, Tuple{…}})(Ξ”::Float64)
    @ Zygote ~/.julia/packages/Zygote/55SqB/src/compiler/interface2.jl:0
 [13] #13
    @ ./REPL[53]:1 [inlined]
 [14] (::Zygote.Pullback{Tuple{var"#13#14", MyRNN{…}}, Tuple{Zygote.Pullback{…}, Zygote.var"#2006#back#208"{…}}})(Ξ”::Float64)
    @ Zygote ~/.julia/packages/Zygote/55SqB/src/compiler/interface2.jl:0
 [15] (::Zygote.var"#88#89"{Zygote.Pullback{Tuple{…}, Tuple{…}}})(Ξ”::Float64)
    @ Zygote ~/.julia/packages/Zygote/55SqB/src/compiler/interface.jl:97
 [16] gradient(f::Function, args::MyRNN{Vector{Float32}, Matrix{Float32}})
    @ Zygote ~/.julia/packages/Zygote/55SqB/src/compiler/interface.jl:154
 [17] #gradient#1
    @ ~/.julia/packages/Flux/uRn8o/src/gradient.jl:44 [inlined]
 [18] gradient(f::Function, args::MyRNN{Vector{Float32}, Matrix{Float32}})
    @ Flux ~/.julia/packages/Flux/uRn8o/src/gradient.jl:31
 [19] top-level scope
    @ REPL[53]:1
Some type information was truncated. Use `show(err)` to see complete types.

Note BTW that the code above contains various invisible characters. Please, the way to do this is to make one giant code block, with everything needed so that it runs on a fresh Julia session.

Attempts to fix things…

julia> model(xoh) # Vector{Any} is normally bad
10-element Vector{Any}:
 [0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]
 Float32[0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]
 Float32[0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]
 Float32[0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]
 Float32[0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]
 Float32[0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]
 Float32[0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]
 Float32[0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]
 Float32[0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]
 Float32[0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]

julia> gradient(m -> sum(m(xoh)[end]), model)  # but this simple gradient works
((h0 = nothing, W = Float32[35.0 43.0 22.0; 35.0 43.0 22.0; 35.0 43.0 22.0]),)

julia> function (m::MyRNN)(xoh)
           y =  Matrix{Float32}[]  # trying to fix that, but actually made new errors
           # repeat() does not work with gradient()
           # ht = repeat(m.h0, 1, size(xoh[1], 2))
           ht = m.h0 * ones(1, size(xoh[1], 2)) 
           for xt in xoh # Loop over seqlen, all sequences in parallell
               y = [y; [ht]] # this prediction
               ht = m.W * xt # next prediction - last one unused :-)
           end
           return y
       end

julia> model(xoh) 
10-element Vector{Matrix{Float64}}:
 [0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]
 [0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]
 [0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]
 [0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]
 [0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]
 [0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]
 [0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]
 [0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]
 [0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]
 [0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]

julia> gradient(m -> sum(m(xoh)[end]), model)
ERROR: MethodError: no method matching iterate(::Nothing)
The function `iterate` exists, but no method is defined for this combination of argument types.

Closest candidates are:
  iterate(::Flux.Optimise.Optimiser, Any...; kwargs...)
   @ Flux ~/.julia/packages/MacroTools/Ar0jT/src/examples/forward.jl:17
  iterate(::BitSet)
   @ Base bitset.jl:333
  iterate(::BitSet, ::Any)
   @ Base bitset.jl:333
  ...

Stacktrace:
  [1] (::Zygote.var"#545#548")(dy::Nothing)
    @ Zygote ~/.julia/packages/Zygote/55SqB/src/lib/array.jl:68
  [2] (::Zygote.var"#2627#back#549"{…})(Ξ”::ChainRulesCore.InplaceableThunk{…})
    @ Zygote ~/.julia/packages/ZygoteRules/CkVIK/src/adjoint.jl:72
  [3] MyRNN
    @ ./REPL[62]:2 [inlined]
  [4] (::Zygote.Pullback{…})(Ξ”::ChainRulesCore.InplaceableThunk{…})
    @ Zygote ~/.julia/packages/Zygote/55SqB/src/compiler/interface2.jl:0
  [5] #31
    @ ./REPL[64]:1 [inlined]
  [6] (::Zygote.Pullback{Tuple{…}, Tuple{…}})(Ξ”::Float64)
    @ Zygote ~/.julia/packages/Zygote/55SqB/src/compiler/interface2.jl:0
  [7] (::Zygote.var"#88#89"{Zygote.Pullback{Tuple{…}, Tuple{…}}})(Ξ”::Float64)
    @ Zygote ~/.julia/packages/Zygote/55SqB/src/compiler/interface.jl:97
  [8] gradient(f::Function, args::MyRNN{Vector{Float32}, Matrix{Float32}})
    @ Zygote ~/.julia/packages/Zygote/55SqB/src/compiler/interface.jl:154
  [9] #gradient#1
    @ ~/.julia/packages/Flux/uRn8o/src/gradient.jl:44 [inlined]
 [10] gradient(f::Function, args::MyRNN{Vector{Float32}, Matrix{Float32}})
    @ Flux ~/.julia/packages/Flux/uRn8o/src/gradient.jl:31
 [11] top-level scope
    @ REPL[64]:1
Some type information was truncated. Use `show(err)` to see complete types.

Maybe hcat(Ε· ...) is the problem? Splatting dynamic things (like vectors) is usually bad for Julia, and is especially bad for Zygote. I’m not certain this is the same as stack makes a different shape, but it does seem to run:

julia> function loss(model, xoh)
           Ε· = model(xoh)
           return Flux.logitcrossentropy(stack(Ε·), stack(xoh))
       end
loss (generic function with 1 method)

julia> g = gradient(m -> loss(m, xoh), model)[1]
(h0 = Float32[-0.0016666667, -0.009666666, 0.011333333], W = Float32[-0.21000004 0.12900002 0.06600001; 0.10500002 -0.25800005 0.06600001; 0.10500002 0.12900002 -0.13200001])

Yes, it seems like β€˜stack’ works, while β€˜hcat’ does not.
And yes, next time I’ll make one code block instead of copying my Jupyter blocks.

Thanks a lot for your help.

1 Like