Relatively complex function breaking with Zygote - what limitations exist for differential programming?

Hello, I am new to Julia (although not really to programming) and not a top-brain in calculus or statistical learning - although I think I have enough background to understand complex stuff if I put a bit of time :smiley: I was evaluating which kind of code complexity I can use with Zygote.
I wrote this snippet:

using Zygote

@enum TaskType TaskA=1 TaskB

function process_items(cost_per_item::Int64, items_per_tick::Int64, items::Array{TaskType, 1})
    processing::Array{Int64,1} = Int64[]
    for i in items
        cost = cost_per_item
        if i == TaskB
            cost = cost_per_item * 2
        end
        push!(processing, cost)
    end

    to_process::Int64 = size(processing)[1]
    ticks::Int64 = 0
    while to_process > 0
        processed_this_tick::Int64 = 0
        next::Int64 = 1
        while processed_this_tick < items_per_tick && (next <= size(processing)[1])
            processing[next] -= 1
            if processing[next] <= 0
                deleteat!(processing, next)
            else
                next += 1
            end
            processed_this_tick += 1
        end
        ticks += 1
        to_process = size(processing)[1]
    end

    return ticks
end

cpi = 2
ipt = 4
g = gradient(
    (cost_per_item, item_per_tick) -> process_items(cost_per_item, item_per_tick, [TaskA, TaskA, TaskA, TaskA, TaskA, TaskA, TaskA, TaskA])
    , cpi, ipt)
println(g)

My idea was to get gradients for (cost_per_item, item_per_tick) while calling the function on some known “Tasks”, and the idea is then to write an “optimizer” that could find the optimal values for those items given the Tasks via eg. backpropagation.

If I try to run the code, I get:

ERROR: LoadError: Can't differentiate foreigncall expression
Stacktrace:
  [1] error(s::String)
    @ Base .\error.jl:33
  [2] Pullback
    @ .\array.jl:895 [inlined]
  [3] (::typeof(∂(_deleteat!)))(Δ::Nothing)
    @ Zygote C:\Users\Rudi\.julia\packages\Zygote\nsu1Y\src\compiler\interface2.jl:0
  [4] Pullback
    @ .\array.jl:1339 [inlined]
  [5] (::typeof(∂(deleteat!)))(Δ::Nothing)
    @ Zygote C:\Users\Rudi\.julia\packages\Zygote\nsu1Y\src\compiler\interface2.jl:0
  [6] Pullback
    @ C:\Users\Rudi\Documents\myprojects\julia_experiments\micro_processor.jl:23 [inlined]
  [7] (::typeof(∂(process_items)))(Δ::Int64)
    @ Zygote C:\Users\Rudi\.julia\packages\Zygote\nsu1Y\src\compiler\interface2.jl:0
  [8] Pullback
    @ C:\Users\Rudi\Documents\myprojects\julia_experiments\micro_processor.jl:51 [inlined]
  [9] (::typeof(∂(#1)))(Δ::Int64)
    @ Zygote C:\Users\Rudi\.julia\packages\Zygote\nsu1Y\src\compiler\interface2.jl:0
 [10] (::Zygote.var"#50#51"{typeof(∂(#1))})(Δ::Int64)
    @ Zygote C:\Users\Rudi\.julia\packages\Zygote\nsu1Y\src\compiler\interface.jl:41
 [11] gradient(::Function, ::Int64, ::Vararg{Int64, N} where N)
    @ Zygote C:\Users\Rudi\.julia\packages\Zygote\nsu1Y\src\compiler\interface.jl:76
 [12] top-level scope
    @ C:\Users\Rudi\Documents\myprojects\julia_experiments\micro_processor.jl:50
in expression starting at C:\Users\Rudi\Documents\myprojects\julia_experiments\micro_processor.jl:50

Now, somehow I seem to understand I can’t use “deleteat!”, which might even be understandable, since it has side-effects-oh-my etc. However, I was wondering:

  • Is “deleteat” really not expected to work with Zygote?
  • What are the things I should be careful with when writing code I want to differentiate, eg, which constructs or functions should I avoid? Like, “push!” seems to be ok in this case, is it?
  • Maybe difficult question, however: is what I am trying to do (getting gradients only for those 2 variables, then try to back-propagate) something that I should be able to accomplish?

Thanks everybody in advance!

No, deleteat and push! are not expected to work. In general you will want to avoid anything which mutates an array, in code that you want Zygote to process. For instance the first loop for i in items looks like it could easily be done with map, or a list comprehension.

1 Like

Thanks @mcabbott , I actually rewrote the code to be more “functional” and avoid mutating collections

using Zygote

@enum TaskType TaskA=1 TaskB

function assign_task_cost(base_cost::Int64, t::TaskType)::Int64
    c::Int64 = base_cost
    if t == TaskB
        c *= 2
    end
    return c
end

function process(indices_to_process, index_and_value::Tuple{Int64, Int64})
    index::Int64 = index_and_value[1]
    val::Int64 = index_and_value[2]
    return index in indices_to_process ? val - 1 : val
end

function process_items(cost_per_item::Int64, items_per_tick::Int64, items::Array{TaskType, 1})::Int64
    cost_assigner = Base.Fix1(assign_task_cost, cost_per_item)
    processing::Array{Int64,1} = map(cost_assigner, items)
    ticks::Int64 = 0
    while !isempty(processing)
        indices_to_process::Array{Int64, 1} = collect(1:items_per_tick)
        process_function = Base.Fix1(process, indices_to_process)
        processing = map(process_function, enumerate(processing))
        processing = filter(>(0), processing)
        ticks += 1
    end
    return ticks
end

Then, I try to get gradients this way:

cpi = 2
ipt = 4
g = gradient(
    (cost_per_item, item_per_tick) -> process_items(cost_per_item, item_per_tick, [TaskA, TaskA, TaskA, TaskA, TaskA, TaskA, TaskA, TaskA])
    , cpi, ipt)
println(g)

This now runs to completions, but I get as output: (nothing, nothing).
Do you know whether the way I am calling “gradient” is correct? And even, do you have any ideas on why this “fails”?
Thank you so much!

Given everything is an integer here, I’m not sure if gradients even make sense here.

You are right :smiley:
However, I have changed it this way

using Zygote

@enum TaskType TaskA=1 TaskB

function assign_task_cost(base_cost::Int64, t::TaskType)::Int64
    c::Int64 = base_cost
    if t == TaskB
        c *= 2
    end
    return c
end

function process(indices_to_process, index_and_value::Tuple{Int64, Int64})
    index::Int64 = index_and_value[1]
    val::Int64 = index_and_value[2]
    return index in indices_to_process ? val - 1 : val
end

function map_float_to_discrete_uniformly(w::Float64, num_vals::Int64)::Int64
    s = 1.0 / num_vals
    return floor(w / s)
end

function process_items(cost_per_item_weight::Float64, items_per_tick_weight::Float64, items::Array{TaskType, 1})::Float64
    # Globals-like
    costs::Array{Int64, 1} = [2, 3, 4]
    items_at_time::Array{Int64, 1} = [1, 2, 3, 4] 

    cost_per_item = costs[map_float_to_discrete_uniformly(cost_per_item_weight, length(costs)) + 1]
    items_per_tick = items_at_time[map_float_to_discrete_uniformly(items_per_tick_weight, length(items_at_time)) + 1]
    println("cost_per_item: ", cost_per_item)
    println("items_per_tick: ", items_per_tick)

    cost_assigner = Base.Fix1(assign_task_cost, cost_per_item)
    processing::Array{Int64,1} = map(cost_assigner, items)
    ticks::Int64 = 0
    while !isempty(processing)
        indices_to_process::Array{Int64, 1} = collect(1:items_per_tick)
        process_function = Base.Fix1(process, indices_to_process)
        processing = map(process_function, enumerate(processing))
        processing = filter(>(0), processing)
        ticks += 1
    end
    println("ticks: ", ticks)
    return 1.0 / ticks
end

And then I call similarly

cpi_w = 0.1
ipt_w = 0.9
g = gradient(
    (cost_per_item_w, item_per_tick_w) -> process_items(cost_per_item_w, item_per_tick_w, [TaskA, TaskA, TaskA, TaskA, TaskA, TaskA, TaskA, TaskA])
    , cpi_w, ipt_w)
println(g)

But I still get “(nothing, nothing)”…

I am definitely missing the context of what you are trying to do here but the function you are trying to optimize here is still not really a “continuous”/differentiable function of the parameters/inputs of your function, so the gradients don’t really make sense.