Good to know that there’s still space for improvement!
It turns out that Julia’s new parser-level broadcast fusion was bypassing ReverseDiff’s primitives, so the broadcast operations were getting unrolled in ReverseDiff’s tape.
Do I understand it correctly that Julia’s parser was transforming the code into a form (probably broadcast!(...) kind of things) that ReverseDiff can’t understand and thus can’t use it’s own optimizations for broadcasting?
BTW, not sure how applicable it’s for ReverseDiff, but Espresso.jl (a backbone of XDiff) can do “unfusion” of expressions automatically:
julia> using Espresso
julia> _, ex = funexpr(autoencoder_cost, map(typeof, vals))
(Symbol[:We1, :We2, :Wd, :b1, :b2, :x], quote
firstLayer = logistic.(We1 * x .+ b1)
encodedInput = logistic.(We2 * firstLayer .+ b2)
reconstructedInput = logistic.(Wd * encodedInput)
cost = sum((reconstructedInput .- x) .^ 2.0)
cost
end)
julia> to_expr(ExGraph(ex))
quote
tmp1235 = We1 * x
tmp1236 = tmp1235 .+ b1
firstLayer = Main.logistic.(tmp1236)
tmp1238 = We2 * firstLayer
tmp1239 = tmp1238 .+ b2
encodedInput = Main.logistic.(tmp1239)
tmp1241 = Wd * encodedInput
reconstructedInput = Main.logistic.(tmp1241)
tmp1243 = reconstructedInput .- x
tmp1244 = 2.0
tmp1245 = tmp1243 .^ tmp1244
cost = sum(tmp1245)
end