Tullio and the nograd option

erlebach · December 26, 2022, 3:50am

I have a Tullio expression:

Poly(coef) = @tullio poly[i] := coef[j] * x[i]^I[j].x * y[i]^I[j].y grad=Dual nograd=x nograd=y nograd=I

I assumed that the nograd option expressed that these arrays would be ignored when Zygote computes the gradient. If this is the case, then why would l get the follow error later in the code:

ERROR: Mutating arrays is not supported -- called push!(Vector{NamedTuple{(:x, :y), Tuple{Int64, Int64}}}, ...)
This error occurs when you ask Zygote to differentiate operations that change
the elements of arrays in place (e.g. setting values with x .= ...)

Possible fixes:
- avoid mutating operations (preferred)
- or read the documentation and solutions for this error
  https://fluxml.ai/Zygote.jl/latest/limitations

Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] _throw_mutation_error(f::Function, args::Vector{NamedTuple{(:x, :y), Tuple{Int64, Int64}}})
    @ Zygote ~/.julia/packages/Zygote/SmJK6/src/lib/array.jl:86
  [3] (::Zygote.var"#397#398"{Vector{NamedTuple{(:x, :y), Tuple{Int64, Int64}}}})(#unused#::Nothing)
    @ Zygote ~/.julia/packages/Zygote/SmJK6/src/lib/array.jl:105
  [4] (::Zygote.var"#2508#back#399"{Zygote.var"#397#398"{Vector{NamedTuple{(:x, :y), Tuple{Int64, Int64}}}}})(Δ::Nothing)

which clearly is a reflection that the array I is involved, and yet, I specifically targetted it with a nograd. Here is the definition of I:

    NT = NamedTuple{(:x,:y), Tuple{Int64, Int64}}
    I = Vector{NT}()
    for i in 0:degree
        for j in 0:degree
            if i + j < (degree+1)
                push!(I, (x=i, y=j))
            end
        end
    end

which is part of a more general custom polynomial layer in Lux.jl:

function generate_polynomial(N, degree, x, y)
    println("Enter Polynomial::generate_polynomial, length x, y: ", size(x), size(y)) # x: (128, 2), y: (2, 128)

    NT = NamedTuple{(:x,:y), Tuple{Int64, Int64}}
    I = Vector{NT}()
    for i in 0:degree
        for j in 0:degree
            if i + j < (degree+1)
                push!(I, (x=i, y=j))
            end
        end
    end

	coef = randn(length(I))
     println("=====> Polynomial, length coef: ", length(coef))

    Poly(coef) = @tullio poly[i] := coef[j] * x[i]^I[j].x * y[i]^I[j].y grad=Dual nograd=x nograd=y nograd=I
    return Poly, coef
end

Thanks for any insights!

erlebach · December 26, 2022, 4:13am

Here is additional information. I copied the offending array I onto a new array J, and converted J to a Tuple, which is by definition immutable:

    NT = NamedTuple{(:x,:y), Tuple{Int64, Int64}}
    I = Vector{NT}()
    for i in 0:degree
        for j in 0:degree
            if i + j < (degree+1)
                push!(I, (x=i, y=j))
            end
        end
    end

    J = Tuple(copy(I))  # This should resolve the mutability issue on I

	coef = randn(length(J))
     println("=====> Polynomial, length coef: ", length(coef))

    Poly(coef) = @tullio poly[i] := coef[j] * x[i]^J[j].x * y[i]^J[j].y grad=Dual nograd=x nograd=y nograd=J

When computing the derivative with Zygote.gradient, I still got the error stating the mutations were illegal:

Enter Polynomial::generate_polynomial, length x, y: (128,)(128,)
=====> Polynomial, length coef: 15
ERROR: Mutating arrays is not supported -- called push!(Vector{NamedTuple{(:x, :y), Tuple{Int64, Int64}}}, ...)
This error occurs when you ask Zygote to differentiate operations that change
the elements of arrays in place (e.g. setting values with x .= ...)

Possible fixes:
- avoid mutating operations (preferred)
- or read the documentation and solutions for this error
  https://fluxml.ai/Zygote.jl/latest/limitations

Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] _throw_mutation_error(f::Function, args::Vector{NamedTuple{(:x, :y), Tuple{Int64, Int64}}})
    @ Zygote ~/.julia/packages/Zygote/SmJK6/src/lib/array.jl:86
  [3] (::Zygote.var"#397#398"{Vector{NamedTuple{(:x, :y), Tuple{Int64, Int64}}}})(#unused#::Nothing)
    @ Zygote ~/.julia/packages/Zygote/SmJK6/src/lib/array.jl:105
  [4] (::Zygote.var"#2508#back#399"{Zygote.var"#397#398"{Vector{NamedTuple{(:x, :y), Tuple{Int64, Int64}}}}})(Δ::Nothing

This message clearly refers to I and not J because of the error:

 [2] _throw_mutation_error(f::Function, args::Vector{NamedTuple{(:x, :y), Tuple{Int64, Int64}}})

I is a vector, while j is a Tuple.

erlebach · December 26, 2022, 5:33am

I will probably have to create a MWE, which I am loathe to do because of the multiple interacting pieces. But here is an error occurring in Zygote, probably due to some kind of array content corruption. I ran with --inline=no and --bound-check=yes, and found that there was still inlining performed. Strange. I also tried with --optimization=0.

Here is the error, in case something jumps out:

ERROR: UndefVarError: 🐳 not defined
Stacktrace:
  [1] ∇𝒜𝒸𝓉!
    @ ~/.julia/packages/Tullio/NGyNM/src/macro.jl:1211 [inlined]
  [2] tile_halves(fun!::var"#∇𝒜𝒸𝓉!#15", ::Type{Union{Nothing, Vector{Any}}}, As::Tuple{Vector{Float64}, Nothing, Nothing, Nothing, Vector{Float64}, Vector{Float64}, Vector{Float64}, Vector{Float64}, Vector{Tuple{Int64, Int64}}, Vector{Float64}}, Is::Tuple{}, Js::Tuple{UnitRange{Int64}, UnitRange{Int64}}, keep::Nothing, final::Nothing)
    @ Tullio ~/.julia/packages/Tullio/NGyNM/src/threads.jl:139
  [3] tile_halves(fun!::var"#∇𝒜𝒸𝓉!#15", ::Type{Union{Nothing, Vector{Any}}}, As::Tuple{Vector{Float64}, Nothing, Nothing, Nothing, Vector{Float64}, Vector{Float64}, Vector{Float64}, Vector{Float64}, Vector{Tuple{Int64, Int64}}, Vector{Float64}}, Is::Tuple{}, Js::Tuple{UnitRange{Int64}, UnitRange{Int64}}, keep::Nothing, final::Nothing) (repeats 2 times)
    @ Tullio ~/.julia/packages/Tullio/NGyNM/src/threads.jl:146
  [4] tile_halves
    @ ~/.julia/packages/Tullio/NGyNM/src/threads.jl:136 [inlined]
  [5] ∇threader(fun!::var"#∇𝒜𝒸𝓉!#15", #unused#::Type{Union{Nothing, Vector{Any}}}, As::Tuple{Vector{Float64}, Nothing, Nothing, Nothing, Vector{Float64}, Vector{Float64}, Vector{Float64}, Vector{Float64}, Vector{Tuple{Int64, Int64}}, Vector{Float64}}, I0s::Tuple{}, J0s::Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, block::Int64)
    @ Tullio ~/.julia/packages/Tullio/NGyNM/src/threads.jl:103
  [6] (::var"#17#∇ℳ𝒶𝓀ℯ#16"{var"#∇𝒜𝒸𝓉!#15"})(𝛥ℛ::Vector{Float64}, ℛ::Vector{Float64}, coef::Vector{Float64}, x::Vector{Float64}, I::Vector{Tuple{Int64, Int64}}, y::Vector{Float64})
    @ Main ~/.julia/packages/Tullio/NGyNM/src/macro.jl:1300
  [7] tullio_back
    @ ~/.julia/packages/Tullio/NGyNM/src/eval.jl:53 [inlined]
  [8] (::Zygote.ZBack{Tullio.var"#tullio_back#156"{Tullio.Eval{var"#ℳ𝒶𝓀ℯ#17"{var"#𝒜𝒸𝓉!#14"}, var"#17#∇ℳ𝒶𝓀ℯ#16"{var"#∇𝒜𝒸𝓉!#15"}}, Tuple{Vector{Float64}, Vector{Float64}, Vector{Tuple{Int64, Int64}}, Vector{Float64}}, Vector{Float64}}})(dy::Vector{Float64})
    @ Zygote ~/.julia/packages/Zygote/SmJK6/src/compiler/chainrules.jl:206
  [9] Pullback
    @ ~/src/2022/basic_UODE/custom_lux_layer/Polynomial.jl:25 [inlined]
 [10] (::typeof(∂(λ)))(Δ::Vector{Float64})
    @ Zygote ~/.julia/packages/Zygote/SmJK6/src/compiler/interface2.jl:0
 [11] Pullback
    @ ~/src/2022/basic_UODE/custom_lux_layer/Polynomial_layer.jl:53 [inlined]
 [12] (::typeof(∂(λ)))(Δ::Tuple{Vector{Float64}, Vector{Float64}, Nothing})
    @ Zygote ~/.julia/packages/Zygote/SmJK6/src/compiler/interface2.jl:0
 [13] Pullback
    @ ~/src/2022/basic_UODE/custom_lux_layer/testing_lux_layer_test.jl:78 [inlined]
 [14] Pullback
    @ ~/src/2022/basic_UODE/custom_lux_layer/testing_lux_layer_test.jl:132 [inlined]
 [15] (::typeof(∂(λ)))(Δ::Float64)
    @ Zygote ~/.julia/packages/Zygote/SmJK6/src/compiler/interface2.jl:0
 [16] (::Zygote.var"#60#61"{typeof(∂(λ))})(Δ::Float64)
    @ Zygote ~/.julia/packages/Zygote/SmJK6/src/compiler/interface.jl:45
 [17] gradient(f::Function, args::ComponentVector{Float32, Vector{Float32}, Tuple{Axis{(coeffs = ViewAxis(1:30, ShapedAxis((2, 15), NamedTuple())),)}}})
    @ Zygote ~/.julia/packages/Zygote/SmJK6/src/compiler/interface.jl:97
 [18] main(tstate::Lux.Training.TrainState{NamedTuple{(:coeffs,), Tuple{Matrix{Float32}}}, NamedTuple{(), Tuple{}}, NamedTuple{(:coeffs,), Tuple{Optimisers.Leaf{Adam{Float32}, Tuple{Matrix{Float32}, Matrix{Float32}, Tuple{Float32, Float32}}}}}, Polylayer{typeof(Lux.rand32)}}, vjp::Lux.Training.ZygoteVJP, data::Tuple{Matrix{Float32}, Matrix{Float64}}, epochs::Int64)
    @ Main ~/src/2022/basic_UODE/custom_lux_layer/testing_lux_layer_test.jl:132
 [19] top-level scope
    @ ~/src/2022/basic_UODE/custom_lux_layer/testing_lux_layer_test.jl:156

I was able to resolve the mutation issue (which I still do not understand), via:

Ix = 0:degree
    Jx = 0:degree 
    #I = [(x=i,y=j) for i in Ix for j in Jx if i+j <= degree]
    I = [(i,j) for i in Ix for j in Jx if i+j <= degree]

	coef = randn(length(I))
     println("=====> Polynomial, length coef: ", length(coef))

    Poly(coef) = @tullio poly[i] := coef[j] * x[i]^I[j][1] * y[i]^I[j][2] grad=Dual nograd=x nograd=y nograd=I

I still do not understand why the nograd=I was not respected. I probably misunderstand something.

Thanks,

ChrisRackauckas · December 26, 2022, 12:26pm

Isn’t := the mutation operator for Tullio? Just make a new array instead?

erlebach · December 26, 2022, 12:44pm

No. := makes a new array.

mcabbott · December 26, 2022, 12:48pm

For the original error, this is the crucial line:

i.e. the error is from push!(I, (x=i, y=j)), which mutates. This is outside Tullio’s sight and unaffected by its options.

I don’t know what code causes the other. It’s hard to guess without working examples. It would be helpful more broadly to clarify exactly what this function has to do – I seems constant, it’s not clear why you have to generate it while taking the gradient.

No, @tullio a[i] := ... does make a new array, while @tullio a[i] = ... [edit, typo!] mutates an existing one.

erlebach · December 26, 2022, 12:59pm

@mcabbott ,

You mean that " tullio a[i] = ... mutates an existing one? I removed the colon, leaving just the equal sign.

Here is the problem. I wish to evaluate a multivariate polynomial vectorized over many evaluation points. I will create a more detailed post later this morning.

Regarding the mutation of I, I had applied the nograd option on the matrix I , so why would mutation be an issue?

In any case, I resolved the mutation on I via use of a list comprehension appropriately filtered during generation, without loops.

Gordon

mcabbott · December 26, 2022, 1:20pm

Because Tullio’s options only apply to what it’s doing. Code outside the macro is completely unaffected.

Zygote will still try to differentiate everything, or at least to generate the code for doing so; it may not run it if I’s gradient is nothing.

One obvious answer is something like f(a,x,y) = @tullio z[i] := a[n+1, m+1] * x[i]^n * y[i]^m.

It is also possible that you should write the function for scalar x, y using say evalpoly, and then broadcast that.

erlebach · December 26, 2022, 1:40pm

Thanks, @mcabbott . I must say that I do not really understand the Broadcast function, and I do not (so far) consider it very flexible, probably due to a lack of understanding. I created a MWE (before acting on your last message) of the problem I am now confronted with. I consider a 1D polynomial in x with a single term: a constant. I still get the same error:

using Zygote
using Tullio

function generate_polynomial(N, degree, x, y)
    println("Enter Polynomial::generate_polynomial, length x, y: ", size(x), size(y)) # x: (128, 2), y: (2, 128)

    Ix = 0:degree
    Jx = 0:degree 

    # for 2D polynomials
    I = [(i,j) for i in Ix for j in Jx if i+j <= degree]   # Equivalent to the above loop

    # for 1D polynomials
    I = [i for i in Ix]

    # Construct a polynomial in x and y, according to the indices listed in I
    #Poly(coef) = @tullio poly[i] := coef[j] * x[i]^I[j][1] * y[i]^I[j][2] grad=Dual nograd=x nograd=y nograd=I verbose=2

    # 1-D polynomial
    Poly(coef) = @tullio poly[i] := coef[j] * x[i]^I[j][1]  grad=Dual nograd=x nograd=I verbose=2

    return Poly #, coef
end

N = 1
degree = 0 # single term
x = rand(N)
y = rand(N)
#poly, coef = generate_polynomial(N, degree, x, y);
poly = generate_polynomial(N, degree, x, y);
nb_terms = Int((degree+1)*(degree+2) / 2)
coef = rand(nb_terms)
loss = x -> sum(poly(x).^2)
Zygote.gradient(loss, coef)    # =======  ERROR

Here is the output:

 Info: >>>>> Preliminary expressions
│   verbosetidy(ex_pre) =
│    quote
│        (all)((𝒶𝓍->begin
│                        (axes)(𝒶𝓍) == (axes)((first)(I))
│                    end), I) || (throw)("elements I[j] must be of uniform size")
└    end
┌ Info: ===== Base actor 
│   verbosetidy(ex_act) =
│    quote
│        local @inline(function 𝒜𝒸𝓉!(::Type, ℛ::AbstractArray{𝒯}, coef, x, I, 𝒶𝓍i, 𝒶𝓍j, ♻️ = nothing, 💀 = true) where 𝒯
│                    @inbounds @fastmath(begin
│                                for i = 𝒶𝓍i
│                                    𝒜𝒸𝒸 = if ♻️ === nothing
│                                            zero(𝒯)
│                                        else
│                                            ℛ[i]
│                                        end
│                                    for j = 𝒶𝓍j
│                                        𝒜𝒸𝒸 = 𝒜𝒸𝒸 + coef[j] * x[i] ^ (I[j])[1]
│                                    end
│                                    ℛ[i] = 𝒜𝒸𝒸
│                                end
│                            end)
│                end)
└    end
┌ Info: ===== Base actor (gradient using ForwardDiff)
│   verbosetidy(ex_act) =
│    quote
│        local @inline(function ∇𝒜𝒸𝓉!(::Type, 𝛥coef, 𝛥x, 𝛥I, 𝛥ℛ::AbstractArray{𝒯}, ℛ, coef, x, I, 𝒶𝓍i, 𝒶𝓍j, ♻️ = nothing, 💀 = true) where 𝒯
│                    @inbounds @fastmath(begin
│                                begin
│                                    𝜀coef = ForwardDiff.Dual((zero)(𝒯), ((one)(𝒯),))
│                                end
│                                begin
│                                    for j = 𝒶𝓍j
│                                        for i = 𝒶𝓍i
│                                            ℛ = (coef[j] + 𝜀coef) * x[i] ^ 🐳
│                                            𝛥coef[j] = 𝛥coef[j] + ForwardDiff.partials(ℛ, 1) * 𝛥ℛ[i]
│                                        end
│                                    end
│                                end
│                            end)
│                end)
└    end
┌ Warning: can't parallelise this gradient, no shared indices (gradient using ForwardDiff)
└ @ Tullio ~/.julia/packages/Tullio/NGyNM/src/macro.jl:1061
┌ Info: ===== Base actor (gradient method for FillArrays)
│   verbosetidy(ex_act) =
│    quote
│        local @inline(function ∇𝒜𝒸𝓉!(::Type, 𝛥coef, 𝛥x, 𝛥I, 𝛥ℛ::Zygote.Fill{𝒯}, ℛ, coef, x, I, 𝒶𝓍i, 𝒶𝓍j, ♻️ = nothing, 💀 = true) where 𝒯
│                    @inbounds @fastmath(begin
│                                begin
│                                    𝜀coef = ForwardDiff.Dual((zero)(𝒯), ((one)(𝒯),))
│                                    𝛥ℛ_value = 𝛥ℛ.value
│                                end
│                                begin
│                                    for j = 𝒶𝓍j
│                                        for i = 𝒶𝓍i
│                                            ℛ = (coef[j] + 𝜀coef) * x[i] ^ 🐳
│                                            𝛥coef[j] = 𝛥coef[j] + ForwardDiff.partials(ℛ, 1) * 𝛥ℛ_value
│                                        end
│                                    end
│                                end
│                            end)
│                end)
└    end
┌ Warning: can't parallelise this gradient, no shared indices (gradient method for FillArrays)
└ @ Tullio ~/.julia/packages/Tullio/NGyNM/src/macro.jl:1061
[ Info: using ForwardDiff gradient
┌ Info: <<<<< Gradient maker function
│   verbosetidy(ex_make) =
│    quote
│        local function ∇ℳ𝒶𝓀ℯ(𝛥ℛ::AbstractArray{𝒯}, ℛ, coef, x, I) where 𝒯
│                local 𝛥coef = fill!(similar(coef, Base.promote_type(eltype(coef), 𝒯)), 0)
│                local 𝛥x = nothing
│                local 𝛥I = nothing
│                local 𝒶𝓍j = (Tullio.linearindex)(coef)
│                (Tullio.linearindex)(I) == (Tullio.linearindex)(coef) || (throw)("range of index j must agree")
│                local 𝒶𝓍i = (Tullio.linearindex)(x)
│                (Tullio.∇threader)(∇𝒜𝒸𝓉!, (Tullio.storage_type)(𝛥coef, 𝛥x, 𝛥I, coef, x, I), tuple(𝛥coef, 𝛥x, 𝛥I, 𝛥ℛ, ℛ, coef, x, I), tuple(), tuple(𝒶𝓍i, 𝒶𝓍j), 87382)
│                return (𝛥coef, 𝛥x, 𝛥I)
│            end
└    end
┌ Info: threading threshold (from cost = 3)
└   block = 87382
┌ Info: >>>>> Maker function
│   verbosetidy(ex_make) =
│    quote
│        local @inline(function ℳ𝒶𝓀ℯ(coef, x, I)
│                    local 𝒶𝓍j = (Tullio.linearindex)(coef)
│                    (Tullio.linearindex)(I) == (Tullio.linearindex)(coef) || (throw)("range of index j must agree")
│                    local 𝒶𝓍i = (Tullio.linearindex)(x)
│                    @info "left index ranges" i = (Tullio.linearindex)(x)
│                    @info "reduction index ranges" j = (Tullio.linearindex)(coef)
│                    local 𝓇𝒽𝓈(coef, x, I, j, i) = begin
│                                identity(coef[j] * x[i] ^ (I[j])[1])
│                            end
│                    begin
│                        local 𝒯1 = Core.Compiler.return_type(𝓇𝒽𝓈, (typeof)((coef, x, I, (first)(𝒶𝓍j), (first)(𝒶𝓍i))))
│                        local 𝒯2 = if Base.isconcretetype(𝒯1)
│                                    𝒯1
│                                else
│                                    @warn "unable to infer eltype from RHS"
│                                    (typeof)(𝓇𝒽𝓈(coef, x, I, (first)(𝒶𝓍j), (first)(𝒶𝓍i)))
│                                end
│                    end
│                    local 𝒯3 = 𝒯2
│                    local 𝒯 = 𝒯3
│                    local poly = similar(coef, 𝒯, tuple(𝒶𝓍i))
│                    begin
│                        (Tullio.threader)(𝒜𝒸𝓉!, (Tullio.storage_type)(poly, coef, x, I), poly, tuple(coef, x, I), tuple(𝒶𝓍i), tuple(𝒶𝓍j), +, 87382, nothing)
│                        poly
│                    end
│                end)
└    end
┌ store.
│   arrays = [:coef, :x, :I]
│   avx = true
│   axisdefs = Expr[:(local 𝒶𝓍j = (Tullio.linearindex)(coef)), :((Tullio.linearindex)(I) == (Tullio.linearindex)(coef) || (throw)("range of index j must agree")), :(local 𝒶𝓍i = (Tullio.linearindex)(x))]
│   constraintpairs = Tuple[]
│   constraints = Dict{Symbol, Vector}(:j => Any[:((Tullio.linearindex)(coef)), :((Tullio.linearindex)(I))], :i => Any[:((Tullio.linearindex)(x))])
│   cost = 3
│   cuda = true
│   fastmath = true
│   finaliser = :identity
│   grad = :Dual
│   init = :(zero(𝒯))
│   initkeyword = :𝒯
│   leftarray = :poly
│   leftind = [:i]
│   leftnames = Symbol[]
│   leftraw = Any[:i]
│   leftscalar = nothing
│   mod = Main
│   newarray = true
│   nograd = [:x, :I]
│   notfree = Symbol[]
│   outex = Expr[:(local 𝒶𝓍j = (Tullio.linearindex)(coef)), :((Tullio.linearindex)(I) == (Tullio.linearindex)(coef) || (throw)("range of index j must agree")), :( ...
│   outpre = Expr[:((all)((𝒶𝓍->begin
                  (axes)(𝒶𝓍) == (axes)((first)(I))
              end), I) || (throw)("elements I[j] must be of uniform size")) ...
│   padkeyword = :𝒯
│   padmodclamp = false
│   plusequals = false
│   redfun = :+
│   redind = [:j]
│   right = :(coef[j] * x[i] ^ (I[j])[1])
│   rightind = [:j, :i]
│   scalar = false
│   scalars = Symbol[]
│   sharedind = Symbol[]
│   shiftedind = Symbol[]
│   threads = true
│   unsafeleft = Symbol[]
│   unsaferight = Symbol[]
│   verbose = 2
│   zero = false
└
generate_polynomial

1

0

1-element Vector{Float64}:
 0.7905131946859776

1-element Vector{Float64}:
 0.9845248220058904

Enter Polynomial::generate_polynomial, length x, y: (1,)(1,)

1

1-element Vector{Float64}:
 0.15588748892996085

#190 (generic function with 1 method)

┌ Info: left index ranges
└   i = Base.OneTo(1)
┌ Info: reduction index ranges
└   j = Base.OneTo(1)
ERROR: UndefVarError: 🐳 not defined
Stacktrace:
  [1] ∇𝒜𝒸𝓉!
    @ ~/.julia/packages/Tullio/NGyNM/src/macro.jl:1211 [inlined]
  [2] tile_halves(fun!::var"#∇𝒜𝒸𝓉!#186", ::Type{Union{Nothing, Vector{Float64}}}, As::Tuple{Vector{Float64}, Nothing, Nothing, Vector{Float64}, Vector{Float64}, Vector{Float64}, Vector{Float64}, Vector{Int64}}, Is::Tuple{}, Js::Tuple{UnitRange{Int64}, UnitRange{Int64}}, keep::Nothing, final::Bool)
    @ Tullio ~/.julia/packages/Tullio/NGyNM/src/threads.jl:139
  [3] tile_halves
    @ ~/.julia/packages/Tullio/NGyNM/src/threads.jl:136 [inlined]
  [4] ∇threader(fun!::var"#∇𝒜𝒸𝓉!#186", #unused#::Type{Union{Nothing, Vector{Float64}}}, As::Tuple{Vector{Float64}, Nothing, Nothing, Vector{Float64}, Vector{Float64}, Vector{Float64}, Vector{Float64}, Vector{Int64}}, I0s::Tuple{}, J0s::Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}}, block::Int64)
    @ Tullio ~/.julia/packages/Tullio/NGyNM/src/threads.jl:103
  [5] (::var"#942#∇ℳ𝒶𝓀ℯ#187"{var"#∇𝒜𝒸𝓉!#186"})(𝛥ℛ::Vector{Float64}, ℛ::Vector{Float64}, coef::Vector{Float64}, x::Vector{Float64}, I::Vector{Int64})
    @ Main ~/.julia/packages/Tullio/NGyNM/src/macro.jl:1300
  [6] tullio_back
    @ ~/.julia/packages/Tullio/NGyNM/src/eval.jl:53 [inlined]
  [7] (::Zygote.ZBack{Tullio.var"#tullio_back#156"{Tullio.Eval{var"#ℳ𝒶𝓀ℯ#188"{var"#𝒜𝒸𝓉!#185"}, var"#942#∇ℳ𝒶𝓀ℯ#187"{var"#∇𝒜𝒸𝓉!#186"}}, Tuple{Vector{Float64}, Vector{Float64}, Vector{Int64}}, Vector{Float64}}})(dy::Vector{Float64})
    @ Zygote ~/.julia/packages/Zygote/SmJK6/src/compiler/chainrules.jl:206
  [8] Pullback
    @ ~/src/2022/basic_UODE/custom_lux_layer/tullio_zygote_mwe.jl:34 [inlined]
  [9] (::typeof(∂(λ)))(Δ::Vector{Float64})
    @ Zygote ~/.julia/packages/Zygote/SmJK6/src/compiler/interface2.jl:0
 [10] Pullback
    @ ~/src/2022/basic_UODE/custom_lux_layer/tullio_zygote_mwe.jl:48 [inlined]
 [11] (::typeof(∂(#190)))(Δ::Float64)
    @ Zygote ~/.julia/packages/Zygote/SmJK6/src/compiler/interface2.jl:0
 [12] (::Zygote.var"#60#61"{typeof(∂(#190))})(Δ::Float64)
    @ Zygote ~/.julia/packages/Zygote/SmJK6/src/compiler/interface.jl:45
 [13] gradient(f::Function, args::Vector{Float64})
    @ Zygote ~/.julia/packages/Zygote/SmJK6/src/compiler/interface.jl:97
 [14] top-level scope
    @ ~/src/2022/basic_UODE/custom_lux_layer/tullio_zygote_mwe.jl:49

One can see the appearance of the \teapot symbol during the derivative calculation without it having been defined. Where is comes from I have no idea! (@ChrisRackauckas)

erlebach · December 26, 2022, 1:51pm

Could I write your expression as follows? Notice the conditional at the end. The answer is no, I can’t.

 f(a,x,y) = @tullio z[i] := a[n+1, m+1] * x[i]^n * y[i]^m    if n + m <= 3

Two points though: 1st, I may wish to skip some indices so listing the desired indexes in a list seems appropriate. Second, I may wish to combine monomials and other more complex basis functions, perhaps parametrized, for example: a cos\omega t which has two parameters I wish to estimate. So would rather stick to the more general solution if possible. Your solution works if that is what I wanted to do.

I believe the issue is in Zygote, or in ChainRules? Perhaps the expression

x^(I[j][1])

is too complex to parse? Perhaps the indexing [1] is the culprit? That would suggest an even more MWE.

The following worked, so the issue is likely either with me, or with Tullio, probably in that order.

I = [(1,2), (2,3)]
x = rand(5)
loss = x -> sum(x.^(I[1][2]) + x.^(I[2][1]))
Zygote.gradient(loss, x)

erlebach · December 26, 2022, 2:23pm

@ChrisRackauckas, @mcabbott

I believe I have identified the error. The following leads to an error with Zygote:

 I = [(i) for i in Ix]
 Poly(coef) = @tullio poly[i] := coef[j] * x[i]^(I[j][1])  grad=Dual nograd=x nograd=I verbose=2

But the following works (I changed I[j][1] to I[j]

 I = [i for i in Ix]
 Poly(coef) = @tullio poly[i] := coef[j] * x[i]^(I[j])  grad=Dual nograd=x nograd=I verbose=2

To me, this suggests a missing component or error in the chain rules used by Tullio, or perhaps I am using a construct that is simply not acceptable. In the meantime, I have a fix: replace the list of tuples of length two by two lists of integers. That should resolve the problem, but this would create problems if I wish to generalize to multigression with a higher number of variables.

I checked that the following works:

function generate_polynomial(N, degree, x, y)

    Ix = 0:degree
    Iy = 0:degree

    # for 1D polynomials
    IJ = [(i,j) for i in Ix for j in Iy if i + j < (degree+1)]
    I = [i[1] for i in IJ]
    J = [i[2] for i in IJ]

     Poly(coef) = @tullio poly[i] := coef[j] * x[i]^I[j] * y[i]^J[j] grad=Dual nograd=x nograd=y nograd=I nograd=J

    return Poly 
end

N = 128
degree = 4 # single term
x = rand(N)
y = rand(N)
poly = generate_polynomial(N, degree, x, y);
nb_terms = Int((degree+1)*(degree+2) / 2)
coef = rand(nb_terms)
loss = x -> sum(poly(x).^2)
Zygote.gradient(loss, coef)

So if could resolve the original problem, that would be great.

Thanks as usual!

mcabbott · December 26, 2022, 2:33pm

I have no memory of this but the fail whale seems to come from here:

github.com

mcabbott/Tullio.jl/blob/9dcd3ad36a94c5a928cb4d6446c6bd26a84c272c/src/forward.jl#L70-L73


      
          arrayplusepsilon(A, inds, dict) = begin
              @debug string("expression ", A, " is why you can't use ForwardDiff here")
              :🐳
          end

which is trying to say that indexing two levels deep I[i][1] is too hard for the grad=Dual method to differentiate. (Probably it should be an error, not @debug.)

You can work around it by not making deeply nested structures, e.g. making a tuple of powers for x, and a separate one for y.

Maybe. I mean evaluating polynomials with all coefficients is always going to be more straightforward than evaluating them with some subset of the possible coefficients – that’s what you’re trying to do with this list of powers. If the only restriction is m+n<4 then at best this is an optimisation which may save you a factor of 2, compared to just keeping the zero coefficients. I suspect there are much larger factors around.

This is a harder problem than polynomials, and the design is likely to be quite different – involving some tuple of functions, perhaps.

erlebach · December 26, 2022, 2:45pm

Yep, see further message. I figured that out already! This was another exercise in MWE construction and debugging!

erlebach · December 27, 2022, 2:33pm

@ChrisRackauckas, @mcabbott

I tried using macros to generate my lists, as follows:

macro IJ(n::Symbol) # n is degree
    return esc(quote
        f = []
        I = Vector{Int}()
        J = Vector{Int}()
        if $n < 0
            print("The input must be equal to or bigger than zero.") 
        else  
            for i in 0:$n
            for j in 0:$n
                if i+j <= $n
                    push!(I, i)
                    push!(J, j)
                end
            end
            end
        end
        (I, J)
    end)
end

and then the polynomial generator becomes:

function generate_polynomial(degree)
    if degree != 2
        println("Degree must be 2 or 3 for hardcode testing")
    end

    # Hardcode the indices, and see if mutation error goes away
    I, J = @IJ degree
    Poly(coef, x, y) = @tullio poly[i] := coef[j] * x[i]^I[j] * y[i]^J[j] grad=Dual nograd=x nograd=y nograd=I nograd=J 
end

The following then works:

N = 128
d = 2 
poly = generate_polynomial(d)
nb_terms = Int((d+1)*(d+2)/2)
Zygote.gradient((coef, x, y) -> (sum∘poly)(coef, x, y), rand(nb_terms), rand(N), rand(N))

However, if I execute the gradient indirectly as follows: (see earlier messages for more context)

Zygote.gradient((coef, data) -> loss_function(model, coef, data)[1], ps, randn(128,2))

where the loss_function is:

function loss_function(model, ps, data)
    global st   # st was already global, so why is global necessary?
    y_pred, _ = model(data, ps, st)  # Model should return size (model.out_dims, N). Update global st
    mse_loss = mean(abs2, y_pred)     # mutated data?
    return mse_loss, ()   # what is ()? 
end

then I get a mutation error:

ERROR: Mutating arrays is not supported -- called push!(Vector{Int64}, ...)
This error occurs when you ask Zygote to differentiate operations that change
the elements of arrays in place (e.g. setting values with x .= ...)

Possible fixes:
- avoid mutating operations (preferred)
- or read the documentation and solutions for this error
  https://fluxml.ai/Zygote.jl/latest/limitations

Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] _throw_mutation_error(f::Function, args::Vector{Int64})
    @ Zygote ~/.julia/packages/Zygote/SmJK6/src/lib/array.jl:86
  [3] (::Zygote.var"#397#398"{Vector{Int64}})(#unused#::Nothing)
    @ Zygote ~/.julia/packages/Zygote/SmJK6/src/lib/array.jl:105
  [4] (::Zygote.var"#2508#back#399"{Zygote.var"#397#398"{Vector{Int64}}})(Δ::Nothing)
    @ Zygote ~/.julia/packages/ZygoteRules/AIbCs/src/adjoint.jl:67
  [5] Pullback
    @ ~/src/2022/basic_UODE/custom_lux_layer/MWE_training_loop.jl:24 [inlined]
  [6] (::typeof(∂(generate_polynomial)))(Δ::Nothing)
    @ Zygote ~/.julia/packages/Zygote/SmJK6/src/compiler/interface2.jl:0
  [7] Pullback
    @ ~/src/2022/basic_UODE/custom_lux_layer/MWE_training_loop.jl:110 [inlined]
  [8] (::typeof(∂(λ)))(Δ::Tuple{Matrix{Float64}, Nothing})
    @ Zygote ~/.julia/packages/Zygote/SmJK6/src/compiler/interface2.jl:0
  [9] Pullback
    @ ~/src/2022/basic_UODE/custom_lux_layer/MWE_training_loop.jl:175 [inlined]
 [10] Pullback
    @ ~/src/2022/basic_UODE/custom_lux_layer/MWE_training_loop.jl:193 [inlined]
 [11] (::typeof(∂(#32)))(Δ::Float64)
    @ Zygote ~/.julia/packages/Zygote/SmJK6/src/compiler/interface2.jl:0
 [12] (::Zygote.var"#60#61"{typeof(∂(#32))})(Δ::Float64)
    @ Zygote ~/.julia/packages/Zygote/SmJK6/src/compiler/interface.jl:45
 [13] gradient(::Function, ::ComponentVector{Float32, Vector{Float32}, Tuple{Axis{(coeffs = ViewAxis(1:12, ShapedAxis((6, 2), NamedTuple())),)}}}, ::Vararg{Any})
    @ Zygote ~/.julia/packages/Zygote/SmJK6/src/compiler/interface.jl:97
 [14] top-level scope
    @ ~/src/2022/basic_UODE/custom_lux_layer/MWE_training_loop.jl:193

Question: would this mutation error have anything to do with either of the macros? I would say that is not possible since the macros are created expressly to allow complex to be created and then used as if typed by hand, and also because the Zygote gradient works when applied outside a function.

I hardcoded the lists inside the generate_polynomials function:

if degree == 2
        IJ = [(0,0), (1,0), (2,0), (0,1), (0,2), (1,1)]
        I = (0,1,2,0,0,1)
        J = (0,0,0,1,2,1)
    elseif degree == 3
        IJ = [(0,0), (1,0), (0,1), (2,0), (1,1), (0,2), (3,0), (2,1), (1,2), (0,3)]
        I = (0, 1, 0, 2, 1, 0, 3, 2, 1, 0)
        J = (0, 0, 1, 0, 1, 2, 0, 1, 2, 3)
    else
        println("Degree greater than 3 not implemented")
    end

and removed the macros. In that case, both computations of the gradient worked fine.

What could be the issue? Why are macros not completely eliminating the mutation issue? Thanks.

Topic		Replies	Views
Applying a macro to a variable General Usage	14	806	January 8, 2023
Mutating versus non-mutating arrays for Zygote Gradient General Usage	6	348	December 25, 2022
Mutation error in Zygote hessian Machine Learning question , package	2	185	June 22, 2023
Vectorization over subset of indices General Usage	8	275	December 25, 2022
Flux Zygote Gradient: Understanding Mutating arrays is not supported Machine Learning	21	4131	December 3, 2020

Tullio and the nograd option

Related topics