Tullio and the nograd option

@ChrisRackauckas , @mcabbott,

I have a Tullio expression:

Poly(coef) = @tullio poly[i] := coef[j] * x[i]^I[j].x * y[i]^I[j].y grad=Dual nograd=x nograd=y nograd=I

I assumed that the nograd option expressed that these arrays would be ignored when Zygote computes the gradient. If this is the case, then why would l get the follow error later in the code:

ERROR: Mutating arrays is not supported -- called push!(Vector{NamedTuple{(:x, :y), Tuple{Int64, Int64}}}, ...)
This error occurs when you ask Zygote to differentiate operations that change
the elements of arrays in place (e.g. setting values with x .= ...)

Possible fixes:
- avoid mutating operations (preferred)
- or read the documentation and solutions for this error
  https://fluxml.ai/Zygote.jl/latest/limitations

Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] _throw_mutation_error(f::Function, args::Vector{NamedTuple{(:x, :y), Tuple{Int64, Int64}}})
    @ Zygote ~/.julia/packages/Zygote/SmJK6/src/lib/array.jl:86
  [3] (::Zygote.var"#397#398"{Vector{NamedTuple{(:x, :y), Tuple{Int64, Int64}}}})(#unused#::Nothing)
    @ Zygote ~/.julia/packages/Zygote/SmJK6/src/lib/array.jl:105
  [4] (::Zygote.var"#2508#back#399"{Zygote.var"#397#398"{Vector{NamedTuple{(:x, :y), Tuple{Int64, Int64}}}}})(Ξ”::Nothing)

which clearly is a reflection that the array I is involved, and yet, I specifically targetted it with a nograd. Here is the definition of I:

    NT = NamedTuple{(:x,:y), Tuple{Int64, Int64}}
    I = Vector{NT}()
    for i in 0:degree
        for j in 0:degree
            if i + j < (degree+1)
                push!(I, (x=i, y=j))
            end
        end
    end

which is part of a more general custom polynomial layer in Lux.jl:

function generate_polynomial(N, degree, x, y)
    println("Enter Polynomial::generate_polynomial, length x, y: ", size(x), size(y)) # x: (128, 2), y: (2, 128)

    NT = NamedTuple{(:x,:y), Tuple{Int64, Int64}}
    I = Vector{NT}()
    for i in 0:degree
        for j in 0:degree
            if i + j < (degree+1)
                push!(I, (x=i, y=j))
            end
        end
    end

	coef = randn(length(I))
     println("=====> Polynomial, length coef: ", length(coef))

    Poly(coef) = @tullio poly[i] := coef[j] * x[i]^I[j].x * y[i]^I[j].y grad=Dual nograd=x nograd=y nograd=I
    return Poly, coef
end

Thanks for any insights!

Here is additional information. I copied the offending array I onto a new array J, and converted J to a Tuple, which is by definition immutable:

    NT = NamedTuple{(:x,:y), Tuple{Int64, Int64}}
    I = Vector{NT}()
    for i in 0:degree
        for j in 0:degree
            if i + j < (degree+1)
                push!(I, (x=i, y=j))
            end
        end
    end

    J = Tuple(copy(I))  # This should resolve the mutability issue on I

	coef = randn(length(J))
     println("=====> Polynomial, length coef: ", length(coef))

    Poly(coef) = @tullio poly[i] := coef[j] * x[i]^J[j].x * y[i]^J[j].y grad=Dual nograd=x nograd=y nograd=J

When computing the derivative with Zygote.gradient, I still got the error stating the mutations were illegal:

Enter Polynomial::generate_polynomial, length x, y: (128,)(128,)
=====> Polynomial, length coef: 15
ERROR: Mutating arrays is not supported -- called push!(Vector{NamedTuple{(:x, :y), Tuple{Int64, Int64}}}, ...)
This error occurs when you ask Zygote to differentiate operations that change
the elements of arrays in place (e.g. setting values with x .= ...)

Possible fixes:
- avoid mutating operations (preferred)
- or read the documentation and solutions for this error
  https://fluxml.ai/Zygote.jl/latest/limitations

Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] _throw_mutation_error(f::Function, args::Vector{NamedTuple{(:x, :y), Tuple{Int64, Int64}}})
    @ Zygote ~/.julia/packages/Zygote/SmJK6/src/lib/array.jl:86
  [3] (::Zygote.var"#397#398"{Vector{NamedTuple{(:x, :y), Tuple{Int64, Int64}}}})(#unused#::Nothing)
    @ Zygote ~/.julia/packages/Zygote/SmJK6/src/lib/array.jl:105
  [4] (::Zygote.var"#2508#back#399"{Zygote.var"#397#398"{Vector{NamedTuple{(:x, :y), Tuple{Int64, Int64}}}}})(Ξ”::Nothing

This message clearly refers to I and not J because of the error:

 [2] _throw_mutation_error(f::Function, args::Vector{NamedTuple{(:x, :y), Tuple{Int64, Int64}}})

I is a vector, while j is a Tuple.

I will probably have to create a MWE, which I am loathe to do because of the multiple interacting pieces. But here is an error occurring in Zygote, probably due to some kind of array content corruption. I ran with --inline=no and --bound-check=yes, and found that there was still inlining performed. Strange. I also tried with --optimization=0.

Here is the error, in case something jumps out:

ERROR: UndefVarError: 🐳 not defined
Stacktrace:
  [1] βˆ‡π’œπ’Έπ“‰!
    @ ~/.julia/packages/Tullio/NGyNM/src/macro.jl:1211 [inlined]
  [2] tile_halves(fun!::var"#βˆ‡π’œπ’Έπ“‰!#15", ::Type{Union{Nothing, Vector{Any}}}, As::Tuple{Vector{Float64}, Nothing, Nothing, Nothing, Vector{Float64}, Vector{Float64}, Vector{Float64}, Vector{Float64}, Vector{Tuple{Int64, Int64}}, Vector{Float64}}, Is::Tuple{}, Js::Tuple{UnitRange{Int64}, UnitRange{Int64}}, keep::Nothing, final::Nothing)
    @ Tullio ~/.julia/packages/Tullio/NGyNM/src/threads.jl:139
  [3] tile_halves(fun!::var"#βˆ‡π’œπ’Έπ“‰!#15", ::Type{Union{Nothing, Vector{Any}}}, As::Tuple{Vector{Float64}, Nothing, Nothing, Nothing, Vector{Float64}, Vector{Float64}, Vector{Float64}, Vector{Float64}, Vector{Tuple{Int64, Int64}}, Vector{Float64}}, Is::Tuple{}, Js::Tuple{UnitRange{Int64}, UnitRange{Int64}}, keep::Nothing, final::Nothing) (repeats 2 times)
    @ Tullio ~/.julia/packages/Tullio/NGyNM/src/threads.jl:146
  [4] tile_halves
    @ ~/.julia/packages/Tullio/NGyNM/src/threads.jl:136 [inlined]
  [5] βˆ‡threader(fun!::var"#βˆ‡π’œπ’Έπ“‰!#15", #unused#::Type{Union{Nothing, Vector{Any}}}, As::Tuple{Vector{Float64}, Nothing, Nothing, Nothing, Vector{Float64}, Vector{Float64}, Vector{Float64}, Vector{Float64}, Vector{Tuple{Int64, Int64}}, Vector{Float64}}, I0s::Tuple{}, J0s::Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, block::Int64)
    @ Tullio ~/.julia/packages/Tullio/NGyNM/src/threads.jl:103
  [6] (::var"#17#βˆ‡β„³π’Άπ“€β„―#16"{var"#βˆ‡π’œπ’Έπ“‰!#15"})(π›₯β„›::Vector{Float64}, β„›::Vector{Float64}, coef::Vector{Float64}, x::Vector{Float64}, I::Vector{Tuple{Int64, Int64}}, y::Vector{Float64})
    @ Main ~/.julia/packages/Tullio/NGyNM/src/macro.jl:1300
  [7] tullio_back
    @ ~/.julia/packages/Tullio/NGyNM/src/eval.jl:53 [inlined]
  [8] (::Zygote.ZBack{Tullio.var"#tullio_back#156"{Tullio.Eval{var"#ℳ𝒢𝓀ℯ#17"{var"#π’œπ’Έπ“‰!#14"}, var"#17#βˆ‡β„³π’Άπ“€β„―#16"{var"#βˆ‡π’œπ’Έπ“‰!#15"}}, Tuple{Vector{Float64}, Vector{Float64}, Vector{Tuple{Int64, Int64}}, Vector{Float64}}, Vector{Float64}}})(dy::Vector{Float64})
    @ Zygote ~/.julia/packages/Zygote/SmJK6/src/compiler/chainrules.jl:206
  [9] Pullback
    @ ~/src/2022/basic_UODE/custom_lux_layer/Polynomial.jl:25 [inlined]
 [10] (::typeof(βˆ‚(Ξ»)))(Ξ”::Vector{Float64})
    @ Zygote ~/.julia/packages/Zygote/SmJK6/src/compiler/interface2.jl:0
 [11] Pullback
    @ ~/src/2022/basic_UODE/custom_lux_layer/Polynomial_layer.jl:53 [inlined]
 [12] (::typeof(βˆ‚(Ξ»)))(Ξ”::Tuple{Vector{Float64}, Vector{Float64}, Nothing})
    @ Zygote ~/.julia/packages/Zygote/SmJK6/src/compiler/interface2.jl:0
 [13] Pullback
    @ ~/src/2022/basic_UODE/custom_lux_layer/testing_lux_layer_test.jl:78 [inlined]
 [14] Pullback
    @ ~/src/2022/basic_UODE/custom_lux_layer/testing_lux_layer_test.jl:132 [inlined]
 [15] (::typeof(βˆ‚(Ξ»)))(Ξ”::Float64)
    @ Zygote ~/.julia/packages/Zygote/SmJK6/src/compiler/interface2.jl:0
 [16] (::Zygote.var"#60#61"{typeof(βˆ‚(Ξ»))})(Ξ”::Float64)
    @ Zygote ~/.julia/packages/Zygote/SmJK6/src/compiler/interface.jl:45
 [17] gradient(f::Function, args::ComponentVector{Float32, Vector{Float32}, Tuple{Axis{(coeffs = ViewAxis(1:30, ShapedAxis((2, 15), NamedTuple())),)}}})
    @ Zygote ~/.julia/packages/Zygote/SmJK6/src/compiler/interface.jl:97
 [18] main(tstate::Lux.Training.TrainState{NamedTuple{(:coeffs,), Tuple{Matrix{Float32}}}, NamedTuple{(), Tuple{}}, NamedTuple{(:coeffs,), Tuple{Optimisers.Leaf{Adam{Float32}, Tuple{Matrix{Float32}, Matrix{Float32}, Tuple{Float32, Float32}}}}}, Polylayer{typeof(Lux.rand32)}}, vjp::Lux.Training.ZygoteVJP, data::Tuple{Matrix{Float32}, Matrix{Float64}}, epochs::Int64)
    @ Main ~/src/2022/basic_UODE/custom_lux_layer/testing_lux_layer_test.jl:132
 [19] top-level scope
    @ ~/src/2022/basic_UODE/custom_lux_layer/testing_lux_layer_test.jl:156

I was able to resolve the mutation issue (which I still do not understand), via:

Ix = 0:degree
    Jx = 0:degree 
    #I = [(x=i,y=j) for i in Ix for j in Jx if i+j <= degree]
    I = [(i,j) for i in Ix for j in Jx if i+j <= degree]

	coef = randn(length(I))
     println("=====> Polynomial, length coef: ", length(coef))

    Poly(coef) = @tullio poly[i] := coef[j] * x[i]^I[j][1] * y[i]^I[j][2] grad=Dual nograd=x nograd=y nograd=I

I still do not understand why the nograd=I was not respected. I probably misunderstand something.

Thanks,

Isn’t := the mutation operator for Tullio? Just make a new array instead?

No. := makes a new array.

For the original error, this is the crucial line:

i.e. the error is from push!(I, (x=i, y=j)), which mutates. This is outside Tullio’s sight and unaffected by its options.

I don’t know what code causes the other. It’s hard to guess without working examples. It would be helpful more broadly to clarify exactly what this function has to do – I seems constant, it’s not clear why you have to generate it while taking the gradient.

No, @tullio a[i] := ... does make a new array, while @tullio a[i] = ... [edit, typo!] mutates an existing one.

1 Like

@mcabbott ,

You mean that " tullio a[i] = ... mutates an existing one? I removed the colon, leaving just the equal sign.

Here is the problem. I wish to evaluate a multivariate polynomial vectorized over many evaluation points. I will create a more detailed post later this morning.

Regarding the mutation of I, I had applied the nograd option on the matrix I , so why would mutation be an issue?

In any case, I resolved the mutation on I via use of a list comprehension appropriately filtered during generation, without loops.

Gordon

Because Tullio’s options only apply to what it’s doing. Code outside the macro is completely unaffected.

Zygote will still try to differentiate everything, or at least to generate the code for doing so; it may not run it if I’s gradient is nothing.

One obvious answer is something like f(a,x,y) = @tullio z[i] := a[n+1, m+1] * x[i]^n * y[i]^m.

It is also possible that you should write the function for scalar x, y using say evalpoly, and then broadcast that.

Thanks, @mcabbott . I must say that I do not really understand the Broadcast function, and I do not (so far) consider it very flexible, probably due to a lack of understanding. I created a MWE (before acting on your last message) of the problem I am now confronted with. I consider a 1D polynomial in x with a single term: a constant. I still get the same error:

using Zygote
using Tullio

function generate_polynomial(N, degree, x, y)
    println("Enter Polynomial::generate_polynomial, length x, y: ", size(x), size(y)) # x: (128, 2), y: (2, 128)

    Ix = 0:degree
    Jx = 0:degree 

    # for 2D polynomials
    I = [(i,j) for i in Ix for j in Jx if i+j <= degree]   # Equivalent to the above loop

    # for 1D polynomials
    I = [i for i in Ix]

    # Construct a polynomial in x and y, according to the indices listed in I
    #Poly(coef) = @tullio poly[i] := coef[j] * x[i]^I[j][1] * y[i]^I[j][2] grad=Dual nograd=x nograd=y nograd=I verbose=2

    # 1-D polynomial
    Poly(coef) = @tullio poly[i] := coef[j] * x[i]^I[j][1]  grad=Dual nograd=x nograd=I verbose=2

    return Poly #, coef
end

N = 1
degree = 0 # single term
x = rand(N)
y = rand(N)
#poly, coef = generate_polynomial(N, degree, x, y);
poly = generate_polynomial(N, degree, x, y);
nb_terms = Int((degree+1)*(degree+2) / 2)
coef = rand(nb_terms)
loss = x -> sum(poly(x).^2)
Zygote.gradient(loss, coef)    # =======  ERROR

Here is the output:

 Info: >>>>> Preliminary expressions
β”‚   verbosetidy(ex_pre) =
β”‚    quote
β”‚        (all)((𝒢𝓍->begin
β”‚                        (axes)(𝒢𝓍) == (axes)((first)(I))
β”‚                    end), I) || (throw)("elements I[j] must be of uniform size")
β””    end
β”Œ Info: ===== Base actor 
β”‚   verbosetidy(ex_act) =
β”‚    quote
β”‚        local @inline(function π’œπ’Έπ“‰!(::Type, β„›::AbstractArray{𝒯}, coef, x, I, 𝒢𝓍i, 𝒢𝓍j, ♻️ = nothing, πŸ’€ = true) where 𝒯
β”‚                    @inbounds @fastmath(begin
β”‚                                for i = 𝒢𝓍i
β”‚                                    π’œπ’Έπ’Έ = if ♻️ === nothing
β”‚                                            zero(𝒯)
β”‚                                        else
β”‚                                            β„›[i]
β”‚                                        end
β”‚                                    for j = 𝒢𝓍j
β”‚                                        π’œπ’Έπ’Έ = π’œπ’Έπ’Έ + coef[j] * x[i] ^ (I[j])[1]
β”‚                                    end
β”‚                                    β„›[i] = π’œπ’Έπ’Έ
β”‚                                end
β”‚                            end)
β”‚                end)
β””    end
β”Œ Info: ===== Base actor (gradient using ForwardDiff)
β”‚   verbosetidy(ex_act) =
β”‚    quote
β”‚        local @inline(function βˆ‡π’œπ’Έπ“‰!(::Type, π›₯coef, π›₯x, π›₯I, π›₯β„›::AbstractArray{𝒯}, β„›, coef, x, I, 𝒢𝓍i, 𝒢𝓍j, ♻️ = nothing, πŸ’€ = true) where 𝒯
β”‚                    @inbounds @fastmath(begin
β”‚                                begin
β”‚                                    πœ€coef = ForwardDiff.Dual((zero)(𝒯), ((one)(𝒯),))
β”‚                                end
β”‚                                begin
β”‚                                    for j = 𝒢𝓍j
β”‚                                        for i = 𝒢𝓍i
β”‚                                            β„› = (coef[j] + πœ€coef) * x[i] ^ 🐳
β”‚                                            π›₯coef[j] = π›₯coef[j] + ForwardDiff.partials(β„›, 1) * π›₯β„›[i]
β”‚                                        end
β”‚                                    end
β”‚                                end
β”‚                            end)
β”‚                end)
β””    end
β”Œ Warning: can't parallelise this gradient, no shared indices (gradient using ForwardDiff)
β”” @ Tullio ~/.julia/packages/Tullio/NGyNM/src/macro.jl:1061
β”Œ Info: ===== Base actor (gradient method for FillArrays)
β”‚   verbosetidy(ex_act) =
β”‚    quote
β”‚        local @inline(function βˆ‡π’œπ’Έπ“‰!(::Type, π›₯coef, π›₯x, π›₯I, π›₯β„›::Zygote.Fill{𝒯}, β„›, coef, x, I, 𝒢𝓍i, 𝒢𝓍j, ♻️ = nothing, πŸ’€ = true) where 𝒯
β”‚                    @inbounds @fastmath(begin
β”‚                                begin
β”‚                                    πœ€coef = ForwardDiff.Dual((zero)(𝒯), ((one)(𝒯),))
β”‚                                    π›₯β„›_value = π›₯β„›.value
β”‚                                end
β”‚                                begin
β”‚                                    for j = 𝒢𝓍j
β”‚                                        for i = 𝒢𝓍i
β”‚                                            β„› = (coef[j] + πœ€coef) * x[i] ^ 🐳
β”‚                                            π›₯coef[j] = π›₯coef[j] + ForwardDiff.partials(β„›, 1) * π›₯β„›_value
β”‚                                        end
β”‚                                    end
β”‚                                end
β”‚                            end)
β”‚                end)
β””    end
β”Œ Warning: can't parallelise this gradient, no shared indices (gradient method for FillArrays)
β”” @ Tullio ~/.julia/packages/Tullio/NGyNM/src/macro.jl:1061
[ Info: using ForwardDiff gradient
β”Œ Info: <<<<< Gradient maker function
β”‚   verbosetidy(ex_make) =
β”‚    quote
β”‚        local function βˆ‡β„³π’Άπ“€β„―(π›₯β„›::AbstractArray{𝒯}, β„›, coef, x, I) where 𝒯
β”‚                local π›₯coef = fill!(similar(coef, Base.promote_type(eltype(coef), 𝒯)), 0)
β”‚                local π›₯x = nothing
β”‚                local π›₯I = nothing
β”‚                local 𝒢𝓍j = (Tullio.linearindex)(coef)
β”‚                (Tullio.linearindex)(I) == (Tullio.linearindex)(coef) || (throw)("range of index j must agree")
β”‚                local 𝒢𝓍i = (Tullio.linearindex)(x)
β”‚                (Tullio.βˆ‡threader)(βˆ‡π’œπ’Έπ“‰!, (Tullio.storage_type)(π›₯coef, π›₯x, π›₯I, coef, x, I), tuple(π›₯coef, π›₯x, π›₯I, π›₯β„›, β„›, coef, x, I), tuple(), tuple(𝒢𝓍i, 𝒢𝓍j), 87382)
β”‚                return (π›₯coef, π›₯x, π›₯I)
β”‚            end
β””    end
β”Œ Info: threading threshold (from cost = 3)
β””   block = 87382
β”Œ Info: >>>>> Maker function
β”‚   verbosetidy(ex_make) =
β”‚    quote
β”‚        local @inline(function ℳ𝒢𝓀ℯ(coef, x, I)
β”‚                    local 𝒢𝓍j = (Tullio.linearindex)(coef)
β”‚                    (Tullio.linearindex)(I) == (Tullio.linearindex)(coef) || (throw)("range of index j must agree")
β”‚                    local 𝒢𝓍i = (Tullio.linearindex)(x)
β”‚                    @info "left index ranges" i = (Tullio.linearindex)(x)
β”‚                    @info "reduction index ranges" j = (Tullio.linearindex)(coef)
β”‚                    local π“‡π’½π“ˆ(coef, x, I, j, i) = begin
β”‚                                identity(coef[j] * x[i] ^ (I[j])[1])
β”‚                            end
β”‚                    begin
β”‚                        local 𝒯1 = Core.Compiler.return_type(π“‡π’½π“ˆ, (typeof)((coef, x, I, (first)(𝒢𝓍j), (first)(𝒢𝓍i))))
β”‚                        local 𝒯2 = if Base.isconcretetype(𝒯1)
β”‚                                    𝒯1
β”‚                                else
β”‚                                    @warn "unable to infer eltype from RHS"
β”‚                                    (typeof)(π“‡π’½π“ˆ(coef, x, I, (first)(𝒢𝓍j), (first)(𝒢𝓍i)))
β”‚                                end
β”‚                    end
β”‚                    local 𝒯3 = 𝒯2
β”‚                    local 𝒯 = 𝒯3
β”‚                    local poly = similar(coef, 𝒯, tuple(𝒢𝓍i))
β”‚                    begin
β”‚                        (Tullio.threader)(π’œπ’Έπ“‰!, (Tullio.storage_type)(poly, coef, x, I), poly, tuple(coef, x, I), tuple(𝒢𝓍i), tuple(𝒢𝓍j), +, 87382, nothing)
β”‚                        poly
β”‚                    end
β”‚                end)
β””    end
β”Œ store.
β”‚   arrays = [:coef, :x, :I]
β”‚   avx = true
β”‚   axisdefs = Expr[:(local 𝒢𝓍j = (Tullio.linearindex)(coef)), :((Tullio.linearindex)(I) == (Tullio.linearindex)(coef) || (throw)("range of index j must agree")), :(local 𝒢𝓍i = (Tullio.linearindex)(x))]
β”‚   constraintpairs = Tuple[]
β”‚   constraints = Dict{Symbol, Vector}(:j => Any[:((Tullio.linearindex)(coef)), :((Tullio.linearindex)(I))], :i => Any[:((Tullio.linearindex)(x))])
β”‚   cost = 3
β”‚   cuda = true
β”‚   fastmath = true
β”‚   finaliser = :identity
β”‚   grad = :Dual
β”‚   init = :(zero(𝒯))
β”‚   initkeyword = :𝒯
β”‚   leftarray = :poly
β”‚   leftind = [:i]
β”‚   leftnames = Symbol[]
β”‚   leftraw = Any[:i]
β”‚   leftscalar = nothing
β”‚   mod = Main
β”‚   newarray = true
β”‚   nograd = [:x, :I]
β”‚   notfree = Symbol[]
β”‚   outex = Expr[:(local 𝒢𝓍j = (Tullio.linearindex)(coef)), :((Tullio.linearindex)(I) == (Tullio.linearindex)(coef) || (throw)("range of index j must agree")), :( ...
β”‚   outpre = Expr[:((all)((𝒢𝓍->begin
                  (axes)(𝒢𝓍) == (axes)((first)(I))
              end), I) || (throw)("elements I[j] must be of uniform size")) ...
β”‚   padkeyword = :𝒯
β”‚   padmodclamp = false
β”‚   plusequals = false
β”‚   redfun = :+
β”‚   redind = [:j]
β”‚   right = :(coef[j] * x[i] ^ (I[j])[1])
β”‚   rightind = [:j, :i]
β”‚   scalar = false
β”‚   scalars = Symbol[]
β”‚   sharedind = Symbol[]
β”‚   shiftedind = Symbol[]
β”‚   threads = true
β”‚   unsafeleft = Symbol[]
β”‚   unsaferight = Symbol[]
β”‚   verbose = 2
β”‚   zero = false
β””
generate_polynomial

1

0

1-element Vector{Float64}:
 0.7905131946859776

1-element Vector{Float64}:
 0.9845248220058904

Enter Polynomial::generate_polynomial, length x, y: (1,)(1,)

1

1-element Vector{Float64}:
 0.15588748892996085

#190 (generic function with 1 method)

β”Œ Info: left index ranges
β””   i = Base.OneTo(1)
β”Œ Info: reduction index ranges
β””   j = Base.OneTo(1)
ERROR: UndefVarError: 🐳 not defined
Stacktrace:
  [1] βˆ‡π’œπ’Έπ“‰!
    @ ~/.julia/packages/Tullio/NGyNM/src/macro.jl:1211 [inlined]
  [2] tile_halves(fun!::var"#βˆ‡π’œπ’Έπ“‰!#186", ::Type{Union{Nothing, Vector{Float64}}}, As::Tuple{Vector{Float64}, Nothing, Nothing, Vector{Float64}, Vector{Float64}, Vector{Float64}, Vector{Float64}, Vector{Int64}}, Is::Tuple{}, Js::Tuple{UnitRange{Int64}, UnitRange{Int64}}, keep::Nothing, final::Bool)
    @ Tullio ~/.julia/packages/Tullio/NGyNM/src/threads.jl:139
  [3] tile_halves
    @ ~/.julia/packages/Tullio/NGyNM/src/threads.jl:136 [inlined]
  [4] βˆ‡threader(fun!::var"#βˆ‡π’œπ’Έπ“‰!#186", #unused#::Type{Union{Nothing, Vector{Float64}}}, As::Tuple{Vector{Float64}, Nothing, Nothing, Vector{Float64}, Vector{Float64}, Vector{Float64}, Vector{Float64}, Vector{Int64}}, I0s::Tuple{}, J0s::Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, block::Int64)
    @ Tullio ~/.julia/packages/Tullio/NGyNM/src/threads.jl:103
  [5] (::var"#942#βˆ‡β„³π’Άπ“€β„―#187"{var"#βˆ‡π’œπ’Έπ“‰!#186"})(π›₯β„›::Vector{Float64}, β„›::Vector{Float64}, coef::Vector{Float64}, x::Vector{Float64}, I::Vector{Int64})
    @ Main ~/.julia/packages/Tullio/NGyNM/src/macro.jl:1300
  [6] tullio_back
    @ ~/.julia/packages/Tullio/NGyNM/src/eval.jl:53 [inlined]
  [7] (::Zygote.ZBack{Tullio.var"#tullio_back#156"{Tullio.Eval{var"#ℳ𝒢𝓀ℯ#188"{var"#π’œπ’Έπ“‰!#185"}, var"#942#βˆ‡β„³π’Άπ“€β„―#187"{var"#βˆ‡π’œπ’Έπ“‰!#186"}}, Tuple{Vector{Float64}, Vector{Float64}, Vector{Int64}}, Vector{Float64}}})(dy::Vector{Float64})
    @ Zygote ~/.julia/packages/Zygote/SmJK6/src/compiler/chainrules.jl:206
  [8] Pullback
    @ ~/src/2022/basic_UODE/custom_lux_layer/tullio_zygote_mwe.jl:34 [inlined]
  [9] (::typeof(βˆ‚(Ξ»)))(Ξ”::Vector{Float64})
    @ Zygote ~/.julia/packages/Zygote/SmJK6/src/compiler/interface2.jl:0
 [10] Pullback
    @ ~/src/2022/basic_UODE/custom_lux_layer/tullio_zygote_mwe.jl:48 [inlined]
 [11] (::typeof(βˆ‚(#190)))(Ξ”::Float64)
    @ Zygote ~/.julia/packages/Zygote/SmJK6/src/compiler/interface2.jl:0
 [12] (::Zygote.var"#60#61"{typeof(βˆ‚(#190))})(Ξ”::Float64)
    @ Zygote ~/.julia/packages/Zygote/SmJK6/src/compiler/interface.jl:45
 [13] gradient(f::Function, args::Vector{Float64})
    @ Zygote ~/.julia/packages/Zygote/SmJK6/src/compiler/interface.jl:97
 [14] top-level scope
    @ ~/src/2022/basic_UODE/custom_lux_layer/tullio_zygote_mwe.jl:49

One can see the appearance of the \teapot symbol during the derivative calculation without it having been defined. Where is comes from I have no idea! (@ChrisRackauckas)

Could I write your expression as follows? Notice the conditional at the end. The answer is no, I can’t.

 f(a,x,y) = @tullio z[i] := a[n+1, m+1] * x[i]^n * y[i]^m    if n + m <= 3

Two points though: 1st, I may wish to skip some indices so listing the desired indexes in a list seems appropriate. Second, I may wish to combine monomials and other more complex basis functions, perhaps parametrized, for example: a cos\omega t which has two parameters I wish to estimate. So would rather stick to the more general solution if possible. Your solution works if that is what I wanted to do.

I believe the issue is in Zygote, or in ChainRules? Perhaps the expression

x^(I[j][1]) 

is too complex to parse? Perhaps the indexing [1] is the culprit? That would suggest an even more MWE.

The following worked, so the issue is likely either with me, or with Tullio, probably in that order. :slight_smile:

I = [(1,2), (2,3)]
x = rand(5)
loss = x -> sum(x.^(I[1][2]) + x.^(I[2][1]))
Zygote.gradient(loss, x)

@ChrisRackauckas, @mcabbott

I believe I have identified the error. The following leads to an error with Zygote:

 I = [(i) for i in Ix]
 Poly(coef) = @tullio poly[i] := coef[j] * x[i]^(I[j][1])  grad=Dual nograd=x nograd=I verbose=2

But the following works (I changed I[j][1] to I[j]

 I = [i for i in Ix]
 Poly(coef) = @tullio poly[i] := coef[j] * x[i]^(I[j])  grad=Dual nograd=x nograd=I verbose=2

To me, this suggests a missing component or error in the chain rules used by Tullio, or perhaps I am using a construct that is simply not acceptable. In the meantime, I have a fix: replace the list of tuples of length two by two lists of integers. That should resolve the problem, but this would create problems if I wish to generalize to multigression with a higher number of variables.

I checked that the following works:

function generate_polynomial(N, degree, x, y)

    Ix = 0:degree
    Iy = 0:degree

    # for 1D polynomials
    IJ = [(i,j) for i in Ix for j in Iy if i + j < (degree+1)]
    I = [i[1] for i in IJ]
    J = [i[2] for i in IJ]

     Poly(coef) = @tullio poly[i] := coef[j] * x[i]^I[j] * y[i]^J[j] grad=Dual nograd=x nograd=y nograd=I nograd=J

    return Poly 
end

N = 128
degree = 4 # single term
x = rand(N)
y = rand(N)
poly = generate_polynomial(N, degree, x, y);
nb_terms = Int((degree+1)*(degree+2) / 2)
coef = rand(nb_terms)
loss = x -> sum(poly(x).^2)
Zygote.gradient(loss, coef)

So if could resolve the original problem, that would be great.

Thanks as usual!

I have no memory of this but the fail whale seems to come from here:

which is trying to say that indexing two levels deep I[i][1] is too hard for the grad=Dual method to differentiate. (Probably it should be an error, not @debug.)

You can work around it by not making deeply nested structures, e.g. making a tuple of powers for x, and a separate one for y.

Maybe. I mean evaluating polynomials with all coefficients is always going to be more straightforward than evaluating them with some subset of the possible coefficients – that’s what you’re trying to do with this list of powers. If the only restriction is m+n<4 then at best this is an optimisation which may save you a factor of 2, compared to just keeping the zero coefficients. I suspect there are much larger factors around.

This is a harder problem than polynomials, and the design is likely to be quite different – involving some tuple of functions, perhaps.

Yep, see further message. I figured that out already! This was another exercise in MWE construction and debugging!

@ChrisRackauckas, @mcabbott

I tried using macros to generate my lists, as follows:

macro IJ(n::Symbol) # n is degree
    return esc(quote
        f = []
        I = Vector{Int}()
        J = Vector{Int}()
        if $n < 0
            print("The input must be equal to or bigger than zero.") 
        else  
            for i in 0:$n
            for j in 0:$n
                if i+j <= $n
                    push!(I, i)
                    push!(J, j)
                end
            end
            end
        end
        (I, J)
    end)
end

and then the polynomial generator becomes:

function generate_polynomial(degree)
    if degree != 2
        println("Degree must be 2 or 3 for hardcode testing")
    end

    # Hardcode the indices, and see if mutation error goes away
    I, J = @IJ degree
    Poly(coef, x, y) = @tullio poly[i] := coef[j] * x[i]^I[j] * y[i]^J[j] grad=Dual nograd=x nograd=y nograd=I nograd=J 
end

The following then works:

N = 128
d = 2 
poly = generate_polynomial(d)
nb_terms = Int((d+1)*(d+2)/2)
Zygote.gradient((coef, x, y) -> (sum∘poly)(coef, x, y), rand(nb_terms), rand(N), rand(N))

However, if I execute the gradient indirectly as follows: (see earlier messages for more context)

Zygote.gradient((coef, data) -> loss_function(model, coef, data)[1], ps, randn(128,2))

where the loss_function is:

function loss_function(model, ps, data)
    global st   # st was already global, so why is global necessary?
    y_pred, _ = model(data, ps, st)  # Model should return size (model.out_dims, N). Update global st
    mse_loss = mean(abs2, y_pred)     # mutated data?
    return mse_loss, ()   # what is ()? 
end

then I get a mutation error:

ERROR: Mutating arrays is not supported -- called push!(Vector{Int64}, ...)
This error occurs when you ask Zygote to differentiate operations that change
the elements of arrays in place (e.g. setting values with x .= ...)

Possible fixes:
- avoid mutating operations (preferred)
- or read the documentation and solutions for this error
  https://fluxml.ai/Zygote.jl/latest/limitations

Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] _throw_mutation_error(f::Function, args::Vector{Int64})
    @ Zygote ~/.julia/packages/Zygote/SmJK6/src/lib/array.jl:86
  [3] (::Zygote.var"#397#398"{Vector{Int64}})(#unused#::Nothing)
    @ Zygote ~/.julia/packages/Zygote/SmJK6/src/lib/array.jl:105
  [4] (::Zygote.var"#2508#back#399"{Zygote.var"#397#398"{Vector{Int64}}})(Ξ”::Nothing)
    @ Zygote ~/.julia/packages/ZygoteRules/AIbCs/src/adjoint.jl:67
  [5] Pullback
    @ ~/src/2022/basic_UODE/custom_lux_layer/MWE_training_loop.jl:24 [inlined]
  [6] (::typeof(βˆ‚(generate_polynomial)))(Ξ”::Nothing)
    @ Zygote ~/.julia/packages/Zygote/SmJK6/src/compiler/interface2.jl:0
  [7] Pullback
    @ ~/src/2022/basic_UODE/custom_lux_layer/MWE_training_loop.jl:110 [inlined]
  [8] (::typeof(βˆ‚(Ξ»)))(Ξ”::Tuple{Matrix{Float64}, Nothing})
    @ Zygote ~/.julia/packages/Zygote/SmJK6/src/compiler/interface2.jl:0
  [9] Pullback
    @ ~/src/2022/basic_UODE/custom_lux_layer/MWE_training_loop.jl:175 [inlined]
 [10] Pullback
    @ ~/src/2022/basic_UODE/custom_lux_layer/MWE_training_loop.jl:193 [inlined]
 [11] (::typeof(βˆ‚(#32)))(Ξ”::Float64)
    @ Zygote ~/.julia/packages/Zygote/SmJK6/src/compiler/interface2.jl:0
 [12] (::Zygote.var"#60#61"{typeof(βˆ‚(#32))})(Ξ”::Float64)
    @ Zygote ~/.julia/packages/Zygote/SmJK6/src/compiler/interface.jl:45
 [13] gradient(::Function, ::ComponentVector{Float32, Vector{Float32}, Tuple{Axis{(coeffs = ViewAxis(1:12, ShapedAxis((6, 2), NamedTuple())),)}}}, ::Vararg{Any})
    @ Zygote ~/.julia/packages/Zygote/SmJK6/src/compiler/interface.jl:97
 [14] top-level scope
    @ ~/src/2022/basic_UODE/custom_lux_layer/MWE_training_loop.jl:193

Question: would this mutation error have anything to do with either of the macros? I would say that is not possible since the macros are created expressly to allow complex to be created and then used as if typed by hand, and also because the Zygote gradient works when applied outside a function.

I hardcoded the lists inside the generate_polynomials function:

if degree == 2
        IJ = [(0,0), (1,0), (2,0), (0,1), (0,2), (1,1)]
        I = (0,1,2,0,0,1)
        J = (0,0,0,1,2,1)
    elseif degree == 3
        IJ = [(0,0), (1,0), (0,1), (2,0), (1,1), (0,2), (3,0), (2,1), (1,2), (0,3)]
        I = (0, 1, 0, 2, 1, 0, 3, 2, 1, 0)
        J = (0, 0, 1, 0, 1, 2, 0, 1, 2, 3)
    else
        println("Degree greater than 3 not implemented")
    end

and removed the macros. In that case, both computations of the gradient worked fine.

What could be the issue? Why are macros not completely eliminating the mutation issue? Thanks.