Suggestions to improve Zygote performance for simple vector map/broadcast/comprehension?

It depends on what code is running inside the comprehension, but probably. The key performance sink pitfall in the OP is that it uses control flow (conditionals and loops). Zygote isn’t able to generate efficient code for functions using control flow, so you’ll see both slower speeds and more allocations. Array comprehensions/map/broadcast with these functions is a worst-case scenario because it literally multiplies the overhead over the number of elements processed.

We can show the impact of removing control flow by using a branchless conditional (ifelse) instead of the ternary:

build_vector2(x) = [ifelse(i<500, x, 0) for i=1:1000]

julia> @btime gradient(x -> sum(build_vector(x)), 1);
  830.159 μs (6559 allocations: 285.09 KiB)

julia> @btime gradient(x -> sum(build_vector2(x)), 1);
  17.263 μs (44 allocations: 119.25 KiB)

However, some functions must use control flow. In that case, you have a few options:

  1. Use API · ChainRules around functions/code blocks that use control flow but don’t need to be differentiated.
  2. Define your own rrule(s) for functions that use control flow. The advice in Writing good rules · ChainRules applies as always, but one additional concern here is to make sure the type of the returned pullback function is stable. If it isn’t, you’ll run into many of the same issues as Zygote does.
2 Likes