Flux newbie: simple Markov

Anders_Holtsberg · November 24, 2025, 11:27am

Note: Flux newbie here.

I am trying to learn Flux with the goal in the end to experiment with different GRU-like models,
oneHot to oneHot with crossentropy, background is NLP.

I started learning Flux by trying to set up the simplest possible Markov chain, h0 is the prediction of the first observation, seq_length = 10. So

A) input = x,
B) ht = \hat{x},
C) correct output = x.

My excercise: estimate h0 and Wxh (and check with the simple non-NN estmate). Since I am not very intelligent I failed at this simple exercise, because RNN supposes there is a direct connection from an input signal to a the output. And if I try to write some code of my own and wrap in Recurrent() it complains about initialstates() not implemented and then signature wrong when I try to implement it. Sooo …

Is there any more comprehensive introduction with examples anywhere?
I can see that the source code in recurrent.jl is actually readable and with lots of comments. Is a better plan to take a couple of days and reverse engineer that file to figure out what is going on?
What is a good way to think about a prediction model? I mean I could take the input x[1:end-1] and the output x[2:end] but that is a little bit silly since I do want a prediction of the first time point, both in my exercise and in coming complex models.

Anders_Holtsberg · December 1, 2025, 3:09pm

So here it is. Problem: it refuses to compute the gradient.

I have a 100 long test batch of sequences, 10 steps each. It is actually a simple Markov chain but I
just want to check my understanding before going full custom LSTM.

I follow the gist of the introductory text on recurrent networks, so xo is a vector of 10 onehot matrices of size 3 x 100 so the time steps in the sequence can be computed in parallell.

struct MyRecurrentModel{H,C}
    h0::H
    W::C
end

Flux.@layer MyRecurrentModel trainable=(h0, W)

function MyRecurrentModel(ncats::Int)
    return MyRecurrentModel(
        zeros(Float32, ncats), 
        zeros(Float32, ncats, ncats)
    )
end

function (m::MyRecurrentModel)(xo)
    y = []
    ht = repeat(m.h0, 1, size(xo[1], 2))
    for xt in xo
        y = [y; [ht]] # this prediction
        ht = m.W * xt # next prediction (meaningless for last one, haha)
    end
    return y
end

The model computes fine. The loss function computes fine.

using Optimisers: AdamW

model = MyRecurrentModel(catcount)

function loss(model, xo)
    ŷ = model(xo)
    return Flux.logitcrossentropy(hcat(ŷ...), hcat(xo...))
end

# initialize the model and optimizer
model = MyRecurrentModel(catcount, catcount)
opt_state = Flux.setup(AdamW(1e-3), model)

But the gradient refuses.

# compute the gradient and update the model
g = gradient(m -> loss(m, xo), model) #[1]

I guess Zygote does not like something. Can anybody help me?

MethodError: no method matching (::ChainRulesCore.ProjectTo{AbstractArray, ...

mcabbott · December 1, 2025, 7:50pm

Can you post the whole error? And perhaps how to define catcount, xo?

Anders_Holtsberg · December 2, 2025, 4:44pm

Thanks for looking at it. Here is the complete source code.

A simple Markov model

 P = [
     0.8 0.1 0.2;
     0.1 0.7 0.2;
     0.1 0.2 0.6
 ]

3×3 Matrix{Float64}:
0.8 0.1 0.2
0.1 0.7 0.2
0.1 0.2 0.6

Helper function to sample from categorical distribution

function sample(outcomes, weights)
    cumsum_weights = cumsum(weights)
    r = rand()
    for i in eachindex(outcomes)
        if r <= cumsum_weights[i]
            return outcomes[i]
        end
    end
    return outcomes[end]
end

Simulate

seqlen = 10
obscnt = 100
catcount = 3
catrange = Int16(1):Int16(catcount)
x = zeros(Int16, seqlen * obscnt)
x[1] = 1
for t in 2:length(x)
    x[t] = sample(1:3, P[:, x[t - 1]])
end
x = permutedims(reshape(x, seqlen, obscnt))

100×10 Matrix{Int16}:
1 1 1 1 1 1 1 1 1 1
1 1 3 2 2 2 2 2 1 1
[…]

Simple ML-estimation

P0 = zeros(Float64, 3)
P = zeros(Float64, 3, 3)
for j in 1:obscnt
    P0[x[j, 1]] += 1
    for i in 2:seqlen
        P[x[j, i], x[j, i - 1]] += 1
    end
end
P ./ sum(P, dims=1)

3×3 Matrix{Float64}:
 0.809399   0.0918728  0.179487
 0.0861619  0.70318    0.209402
 0.104439   0.204947   0.611111

An RNN model should give the same result

using Flux
using OneHotArrays
catcount = 3
catrange = Int16(1):Int16(catcount)
xoh = [Flux.onehotbatch(x[:, 1], catrange) for i = 1:seqlen]
xoh[1]

3×100 OneHotMatrix(::Vector{UInt32}) with eltype Bool:
 1  1  1  ⋅  1  ⋅  ⋅  ⋅  ⋅  1  1  ⋅  1  …  ⋅  ⋅  ⋅  1  ⋅  1  ⋅  1  1  ⋅  ⋅  ⋅
 ⋅  ⋅  ⋅  1  ⋅  ⋅  ⋅  1  1  ⋅  ⋅  ⋅  ⋅     1  1  1  ⋅  1  ⋅  1  ⋅  ⋅  1  ⋅  ⋅
 ⋅  ⋅  ⋅  ⋅  ⋅  1  1  ⋅  ⋅  ⋅  ⋅  1  ⋅     ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  ⋅  1  1

struct MyRNN{H,C}
    h0::H
    W::C
end

Flux.@layer MyRNN trainable=(h0, W)

function MyRNN(nc::Int)
    return MyRNN(
        zeros(Float32, nc), 
        zeros(Float32, nc, nc)
    )
end

function (m::MyRNN)(xoh)
    y = []
    # repeat() does not work with gradient()
    # ht = repeat(m.h0, 1, size(xoh[1], 2))
    ht = m.h0 * ones(1, size(xoh[1], 2)) 
    for xt in xoh # Loop over seqlen, all sequences in parallell
        y = [y; [ht]] # this prediction
        ht = m.W * xt # next prediction - last one unused :-)
    end
    return y
end

model = MyRNN(catcount)
ŷ = model(xoh)
size(ŷ), size(ŷ[1]), size(ŷ[2]), model

((10,), (3, 100), (3, 100), MyRNN{Vector{Float32}, Matrix{Float32}}(Float32[0.0, 0.0, 0.0], Float32[0.0 0.0 0.0; 0.0 0.0 0.0; 0.0 0.0 0.0]))

using Optimisers: AdamW

function loss(model, xoh)
    ŷ = model(xoh)
    return Flux.logitcrossentropy(hcat(ŷ ...), hcat(xoh ...))
end

opt_state = Flux.setup(AdamW(1e-3), model)
g = gradient(m -> loss(m, xoh), model)[1]
#Flux.update!(opt_state, model, g)

MethodError: no method matching (::ChainRulesCore.ProjectTo{AbstractArray, @NamedTuple{elements::Vector{ChainRulesCore.ProjectTo{AbstractArray, @NamedTuple{element::ChainRulesCore.ProjectTo{Float32, @NamedTuple{}}, axes::Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}}}, axes::Tuple{Base.OneTo{Int64}}}})(::Tuple{ChainRulesCore.InplaceableThunk{ChainRulesCore.Thunk{ChainRules.var"#rrule##90#rrule##91"{ChainRulesCore.ProjectTo{AbstractArray, @NamedTuple{element::ChainRulesCore.ProjectTo{Float32, @NamedTuple{}}, axes::Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}}, Tuple{Colon, UnitRange{Int64}}, Matrix{Float64}}}, ChainRules.var"#rrule##88#rrule##89"{Tuple{Colon, UnitRange{Int64}}, Matrix{Float64}}}})
The object of type ChainRulesCore.ProjectTo{AbstractArray, @NamedTuple{elements::Vector{ChainRulesCore.ProjectTo{AbstractArray, @NamedTuple{element::ChainRulesCore.ProjectTo{Float32, @NamedTuple{}}, axes::Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}}}, axes::Tuple{Base.OneTo{Int64}}}} exists, but no method is defined for this combination of argument types when trying to treat it as a callable object.

mcabbott · December 2, 2025, 6:00pm

Here’s the full error from the above:

julia> loss(model, xoh)
1.0986122886681082

julia> g = gradient(m -> loss(m, xoh), model)[1]
ERROR: MethodError: no method matching (::ChainRulesCore.ProjectTo{AbstractArray, @NamedTuple{…}})(::Tuple{ChainRulesCore.InplaceableThunk{…}})
The object of type `ChainRulesCore.ProjectTo{AbstractArray, @NamedTuple{elements::Vector{ChainRulesCore.ProjectTo{AbstractArray, @NamedTuple{element::ChainRulesCore.ProjectTo{Float32, @NamedTuple{}}, axes::Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}}}}, axes::Tuple{Base.OneTo{Int64}}}}` exists, but no method is defined for this combination of argument types when trying to treat it as a callable object.

Closest candidates are:
  (::ChainRulesCore.ProjectTo{T})(::ChainRulesCore.NotImplemented) where T
   @ ChainRulesCore ~/.julia/packages/ChainRulesCore/Vsbj9/src/projection.jl:121
  (::ChainRulesCore.ProjectTo)(::ChainRulesCore.Thunk)
   @ ChainRulesCore ~/.julia/packages/ChainRulesCore/Vsbj9/src/projection.jl:124
  (::ChainRulesCore.ProjectTo)(::ChainRulesCore.InplaceableThunk)
   @ ChainRulesCore ~/.julia/packages/ChainRulesCore/Vsbj9/src/projection.jl:125
  ...

Stacktrace:
  [1] (::ChainRules.var"#487#492"{ChainRulesCore.ProjectTo{…}, Tuple{…}, ChainRulesCore.Tangent{…}})()
    @ ChainRules ~/.julia/packages/ChainRules/14CDN/src/rulesets/Base/array.jl:314
  [2] unthunk
    @ ~/.julia/packages/ChainRulesCore/Vsbj9/src/tangent_types/thunks.jl:213 [inlined]
  [3] unthunk(x::ChainRulesCore.InplaceableThunk{ChainRulesCore.Thunk{…}, ChainRules.var"#486#491"{…}})
    @ ChainRulesCore ~/.julia/packages/ChainRulesCore/Vsbj9/src/tangent_types/thunks.jl:252
  [4] getindex(a::ChainRulesCore.InplaceableThunk{ChainRulesCore.Thunk{…}, ChainRules.var"#486#491"{…}}, args::Int64)
    @ ChainRulesCore ~/.julia/packages/ChainRulesCore/Vsbj9/src/tangent_types/thunks.jl:64
  [5] (::ChainRules.var"#444#446"{ChainRulesCore.InplaceableThunk{…}, Tuple{…}})(n::Int64)
    @ ChainRules ~/.julia/packages/ChainRules/14CDN/src/rulesets/Base/array.jl:53
  [6] ntuple
    @ ./ntuple.jl:19 [inlined]
  [7] (::ChainRules.var"#vect_pullback#445"{…})(ȳ::ChainRulesCore.InplaceableThunk{…})
    @ ChainRules ~/.julia/packages/ChainRules/14CDN/src/rulesets/Base/array.jl:53
  [8] (::Zygote.ZBack{…})(dy::ChainRulesCore.InplaceableThunk{…})
    @ Zygote ~/.julia/packages/Zygote/55SqB/src/compiler/chainrules.jl:222
  [9] MyRNN
    @ ./REPL[39]:16 [inlined]
 [10] (::Zygote.Pullback{Tuple{…}, Any})(Δ::Tuple{ChainRulesCore.InplaceableThunk{…}, Vararg{…}})
    @ Zygote ~/.julia/packages/Zygote/55SqB/src/compiler/interface2.jl:0
 [11] loss
    @ ./REPL[49]:2 [inlined]
 [12] (::Zygote.Pullback{Tuple{…}, Tuple{…}})(Δ::Float64)
    @ Zygote ~/.julia/packages/Zygote/55SqB/src/compiler/interface2.jl:0
 [13] #13
    @ ./REPL[53]:1 [inlined]
 [14] (::Zygote.Pullback{Tuple{var"#13#14", MyRNN{…}}, Tuple{Zygote.Pullback{…}, Zygote.var"#2006#back#208"{…}}})(Δ::Float64)
    @ Zygote ~/.julia/packages/Zygote/55SqB/src/compiler/interface2.jl:0
 [15] (::Zygote.var"#88#89"{Zygote.Pullback{Tuple{…}, Tuple{…}}})(Δ::Float64)
    @ Zygote ~/.julia/packages/Zygote/55SqB/src/compiler/interface.jl:97
 [16] gradient(f::Function, args::MyRNN{Vector{Float32}, Matrix{Float32}})
    @ Zygote ~/.julia/packages/Zygote/55SqB/src/compiler/interface.jl:154
 [17] #gradient#1
    @ ~/.julia/packages/Flux/uRn8o/src/gradient.jl:44 [inlined]
 [18] gradient(f::Function, args::MyRNN{Vector{Float32}, Matrix{Float32}})
    @ Flux ~/.julia/packages/Flux/uRn8o/src/gradient.jl:31
 [19] top-level scope
    @ REPL[53]:1
Some type information was truncated. Use `show(err)` to see complete types.

Note BTW that the code above contains various invisible characters. Please, the way to do this is to make one giant code block, with everything needed so that it runs on a fresh Julia session.

Attempts to fix things…

julia> model(xoh) # Vector{Any} is normally bad
10-element Vector{Any}:
 [0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]
 Float32[0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]
 Float32[0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]
 Float32[0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]
 Float32[0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]
 Float32[0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]
 Float32[0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]
 Float32[0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]
 Float32[0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]
 Float32[0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]

julia> gradient(m -> sum(m(xoh)[end]), model)  # but this simple gradient works
((h0 = nothing, W = Float32[35.0 43.0 22.0; 35.0 43.0 22.0; 35.0 43.0 22.0]),)

julia> function (m::MyRNN)(xoh)
           y =  Matrix{Float32}[]  # trying to fix that, but actually made new errors
           # repeat() does not work with gradient()
           # ht = repeat(m.h0, 1, size(xoh[1], 2))
           ht = m.h0 * ones(1, size(xoh[1], 2)) 
           for xt in xoh # Loop over seqlen, all sequences in parallell
               y = [y; [ht]] # this prediction
               ht = m.W * xt # next prediction - last one unused :-)
           end
           return y
       end

julia> model(xoh) 
10-element Vector{Matrix{Float64}}:
 [0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]
 [0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]
 [0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]
 [0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]
 [0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]
 [0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]
 [0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]
 [0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]
 [0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]
 [0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0; 0.0 0.0 … 0.0 0.0]

julia> gradient(m -> sum(m(xoh)[end]), model)
ERROR: MethodError: no method matching iterate(::Nothing)
The function `iterate` exists, but no method is defined for this combination of argument types.

Closest candidates are:
  iterate(::Flux.Optimise.Optimiser, Any...; kwargs...)
   @ Flux ~/.julia/packages/MacroTools/Ar0jT/src/examples/forward.jl:17
  iterate(::BitSet)
   @ Base bitset.jl:333
  iterate(::BitSet, ::Any)
   @ Base bitset.jl:333
  ...

Stacktrace:
  [1] (::Zygote.var"#545#548")(dy::Nothing)
    @ Zygote ~/.julia/packages/Zygote/55SqB/src/lib/array.jl:68
  [2] (::Zygote.var"#2627#back#549"{…})(Δ::ChainRulesCore.InplaceableThunk{…})
    @ Zygote ~/.julia/packages/ZygoteRules/CkVIK/src/adjoint.jl:72
  [3] MyRNN
    @ ./REPL[62]:2 [inlined]
  [4] (::Zygote.Pullback{…})(Δ::ChainRulesCore.InplaceableThunk{…})
    @ Zygote ~/.julia/packages/Zygote/55SqB/src/compiler/interface2.jl:0
  [5] #31
    @ ./REPL[64]:1 [inlined]
  [6] (::Zygote.Pullback{Tuple{…}, Tuple{…}})(Δ::Float64)
    @ Zygote ~/.julia/packages/Zygote/55SqB/src/compiler/interface2.jl:0
  [7] (::Zygote.var"#88#89"{Zygote.Pullback{Tuple{…}, Tuple{…}}})(Δ::Float64)
    @ Zygote ~/.julia/packages/Zygote/55SqB/src/compiler/interface.jl:97
  [8] gradient(f::Function, args::MyRNN{Vector{Float32}, Matrix{Float32}})
    @ Zygote ~/.julia/packages/Zygote/55SqB/src/compiler/interface.jl:154
  [9] #gradient#1
    @ ~/.julia/packages/Flux/uRn8o/src/gradient.jl:44 [inlined]
 [10] gradient(f::Function, args::MyRNN{Vector{Float32}, Matrix{Float32}})
    @ Flux ~/.julia/packages/Flux/uRn8o/src/gradient.jl:31
 [11] top-level scope
    @ REPL[64]:1
Some type information was truncated. Use `show(err)` to see complete types.

Maybe hcat(ŷ ...) is the problem? Splatting dynamic things (like vectors) is usually bad for Julia, and is especially bad for Zygote. I’m not certain this is the same as stack makes a different shape, but it does seem to run:

julia> function loss(model, xoh)
           ŷ = model(xoh)
           return Flux.logitcrossentropy(stack(ŷ), stack(xoh))
       end
loss (generic function with 1 method)

julia> g = gradient(m -> loss(m, xoh), model)[1]
(h0 = Float32[-0.0016666667, -0.009666666, 0.011333333], W = Float32[-0.21000004 0.12900002 0.06600001; 0.10500002 -0.25800005 0.06600001; 0.10500002 0.12900002 -0.13200001])

Anders_Holtsberg · December 3, 2025, 8:30am

Yes, it seems like ‘stack’ works, while ‘hcat’ does not.
And yes, next time I’ll make one code block instead of copying my Jupyter blocks.

Thanks a lot for your help.

Topic		Replies	Views
Found Bug in Flux General Usage question , package , bug , flux	13	1238	July 11, 2022
Flux RNN training giving `BoundsError` Machine Learning flux	3	623	November 25, 2022
Maliar, Maliar, and Winant using Flux.jl (I just want to write a custom objective) Machine Learning question , flux , zygote	8	745	January 19, 2024
Simple Flux LSTM for Time Series Machine Learning question , flux , time-series , machine-learning	62	13836	April 11, 2022
Flux Custom Loss Function Not Working Properly Machine Learning flux , zygote	20	2338	April 2, 2021

Flux newbie: simple Markov

A simple Markov model

Simple ML-estimation

An RNN model should give the same result

Related topics