Relatively complex function breaking with Zygote - what limitations exist for differential programming?

winterismute · August 30, 2021, 7:15pm

Hello, I am new to Julia (although not really to programming) and not a top-brain in calculus or statistical learning - although I think I have enough background to understand complex stuff if I put a bit of time I was evaluating which kind of code complexity I can use with Zygote.
I wrote this snippet:

using Zygote

@enum TaskType TaskA=1 TaskB

function process_items(cost_per_item::Int64, items_per_tick::Int64, items::Array{TaskType, 1})
    processing::Array{Int64,1} = Int64[]
    for i in items
        cost = cost_per_item
        if i == TaskB
            cost = cost_per_item * 2
        end
        push!(processing, cost)
    end

    to_process::Int64 = size(processing)[1]
    ticks::Int64 = 0
    while to_process > 0
        processed_this_tick::Int64 = 0
        next::Int64 = 1
        while processed_this_tick < items_per_tick && (next <= size(processing)[1])
            processing[next] -= 1
            if processing[next] <= 0
                deleteat!(processing, next)
            else
                next += 1
            end
            processed_this_tick += 1
        end
        ticks += 1
        to_process = size(processing)[1]
    end

    return ticks
end

cpi = 2
ipt = 4
g = gradient(
    (cost_per_item, item_per_tick) -> process_items(cost_per_item, item_per_tick, [TaskA, TaskA, TaskA, TaskA, TaskA, TaskA, TaskA, TaskA])
    , cpi, ipt)
println(g)

My idea was to get gradients for (cost_per_item, item_per_tick) while calling the function on some known “Tasks”, and the idea is then to write an “optimizer” that could find the optimal values for those items given the Tasks via eg. backpropagation.

If I try to run the code, I get:

ERROR: LoadError: Can't differentiate foreigncall expression
Stacktrace:
  [1] error(s::String)
    @ Base .\error.jl:33
  [2] Pullback
    @ .\array.jl:895 [inlined]
  [3] (::typeof(∂(_deleteat!)))(Δ::Nothing)
    @ Zygote C:\Users\Rudi\.julia\packages\Zygote\nsu1Y\src\compiler\interface2.jl:0
  [4] Pullback
    @ .\array.jl:1339 [inlined]
  [5] (::typeof(∂(deleteat!)))(Δ::Nothing)
    @ Zygote C:\Users\Rudi\.julia\packages\Zygote\nsu1Y\src\compiler\interface2.jl:0
  [6] Pullback
    @ C:\Users\Rudi\Documents\myprojects\julia_experiments\micro_processor.jl:23 [inlined]
  [7] (::typeof(∂(process_items)))(Δ::Int64)
    @ Zygote C:\Users\Rudi\.julia\packages\Zygote\nsu1Y\src\compiler\interface2.jl:0
  [8] Pullback
    @ C:\Users\Rudi\Documents\myprojects\julia_experiments\micro_processor.jl:51 [inlined]
  [9] (::typeof(∂(#1)))(Δ::Int64)
    @ Zygote C:\Users\Rudi\.julia\packages\Zygote\nsu1Y\src\compiler\interface2.jl:0
 [10] (::Zygote.var"#50#51"{typeof(∂(#1))})(Δ::Int64)
    @ Zygote C:\Users\Rudi\.julia\packages\Zygote\nsu1Y\src\compiler\interface.jl:41
 [11] gradient(::Function, ::Int64, ::Vararg{Int64, N} where N)
    @ Zygote C:\Users\Rudi\.julia\packages\Zygote\nsu1Y\src\compiler\interface.jl:76
 [12] top-level scope
    @ C:\Users\Rudi\Documents\myprojects\julia_experiments\micro_processor.jl:50
in expression starting at C:\Users\Rudi\Documents\myprojects\julia_experiments\micro_processor.jl:50

Now, somehow I seem to understand I can’t use “deleteat!”, which might even be understandable, since it has side-effects-oh-my etc. However, I was wondering:

Is “deleteat” really not expected to work with Zygote?
What are the things I should be careful with when writing code I want to differentiate, eg, which constructs or functions should I avoid? Like, “push!” seems to be ok in this case, is it?
Maybe difficult question, however: is what I am trying to do (getting gradients only for those 2 variables, then try to back-propagate) something that I should be able to accomplish?

Thanks everybody in advance!

mcabbott · August 30, 2021, 7:29pm

No, deleteat and push! are not expected to work. In general you will want to avoid anything which mutates an array, in code that you want Zygote to process. For instance the first loop for i in items looks like it could easily be done with map, or a list comprehension.

winterismute · September 1, 2021, 4:55pm

Thanks @mcabbott , I actually rewrote the code to be more “functional” and avoid mutating collections

using Zygote

@enum TaskType TaskA=1 TaskB

function assign_task_cost(base_cost::Int64, t::TaskType)::Int64
    c::Int64 = base_cost
    if t == TaskB
        c *= 2
    end
    return c
end

function process(indices_to_process, index_and_value::Tuple{Int64, Int64})
    index::Int64 = index_and_value[1]
    val::Int64 = index_and_value[2]
    return index in indices_to_process ? val - 1 : val
end

function process_items(cost_per_item::Int64, items_per_tick::Int64, items::Array{TaskType, 1})::Int64
    cost_assigner = Base.Fix1(assign_task_cost, cost_per_item)
    processing::Array{Int64,1} = map(cost_assigner, items)
    ticks::Int64 = 0
    while !isempty(processing)
        indices_to_process::Array{Int64, 1} = collect(1:items_per_tick)
        process_function = Base.Fix1(process, indices_to_process)
        processing = map(process_function, enumerate(processing))
        processing = filter(>(0), processing)
        ticks += 1
    end
    return ticks
end

Then, I try to get gradients this way:

cpi = 2
ipt = 4
g = gradient(
    (cost_per_item, item_per_tick) -> process_items(cost_per_item, item_per_tick, [TaskA, TaskA, TaskA, TaskA, TaskA, TaskA, TaskA, TaskA])
    , cpi, ipt)
println(g)

This now runs to completions, but I get as output: (nothing, nothing).
Do you know whether the way I am calling “gradient” is correct? And even, do you have any ideas on why this “fails”?
Thank you so much!

rejuvyesh · September 1, 2021, 5:08pm

Given everything is an integer here, I’m not sure if gradients even make sense here.

winterismute · September 1, 2021, 6:08pm

You are right
However, I have changed it this way

using Zygote

@enum TaskType TaskA=1 TaskB

function assign_task_cost(base_cost::Int64, t::TaskType)::Int64
    c::Int64 = base_cost
    if t == TaskB
        c *= 2
    end
    return c
end

function process(indices_to_process, index_and_value::Tuple{Int64, Int64})
    index::Int64 = index_and_value[1]
    val::Int64 = index_and_value[2]
    return index in indices_to_process ? val - 1 : val
end

function map_float_to_discrete_uniformly(w::Float64, num_vals::Int64)::Int64
    s = 1.0 / num_vals
    return floor(w / s)
end

function process_items(cost_per_item_weight::Float64, items_per_tick_weight::Float64, items::Array{TaskType, 1})::Float64
    # Globals-like
    costs::Array{Int64, 1} = [2, 3, 4]
    items_at_time::Array{Int64, 1} = [1, 2, 3, 4] 

    cost_per_item = costs[map_float_to_discrete_uniformly(cost_per_item_weight, length(costs)) + 1]
    items_per_tick = items_at_time[map_float_to_discrete_uniformly(items_per_tick_weight, length(items_at_time)) + 1]
    println("cost_per_item: ", cost_per_item)
    println("items_per_tick: ", items_per_tick)

    cost_assigner = Base.Fix1(assign_task_cost, cost_per_item)
    processing::Array{Int64,1} = map(cost_assigner, items)
    ticks::Int64 = 0
    while !isempty(processing)
        indices_to_process::Array{Int64, 1} = collect(1:items_per_tick)
        process_function = Base.Fix1(process, indices_to_process)
        processing = map(process_function, enumerate(processing))
        processing = filter(>(0), processing)
        ticks += 1
    end
    println("ticks: ", ticks)
    return 1.0 / ticks
end

And then I call similarly

cpi_w = 0.1
ipt_w = 0.9
g = gradient(
    (cost_per_item_w, item_per_tick_w) -> process_items(cost_per_item_w, item_per_tick_w, [TaskA, TaskA, TaskA, TaskA, TaskA, TaskA, TaskA, TaskA])
    , cpi_w, ipt_w)
println(g)

But I still get “(nothing, nothing)”…

rejuvyesh · September 1, 2021, 8:07pm

I am definitely missing the context of what you are trying to do here but the function you are trying to optimize here is still not really a “continuous”/differentiable function of the parameters/inputs of your function, so the gradients don’t really make sense.

Topic		Replies	Views
Zygote differentiation issues Machine Learning	7	1069	January 28, 2021
Strange behavior of differentiating constant functions in Zygote New to Julia	1	329	December 23, 2019
Finding Derivative of 2 arrays of numbers Using Zygote General Usage question , diffeq	15	691	July 12, 2022
Zygote.gradient(): Mutating arrays is not supported General Usage	1	786	August 18, 2020
Zygote dozens* of times slower than manually written function Performance zygote , forwarddiff	17	1774	April 21, 2022

Relatively complex function breaking with Zygote - what limitations exist for differential programming?

Related topics