I am beginner in ML coming from the scientific computing field and getting started with
Flux.jl for a simple deep learning problem (not very deep actually) with an MLP Neural Network (NN). Like the usual training approach with mini-batch stochastic gradient method, I need to evaluate the gradient of the the NN with respect to parameters at several inputs in each epoch. I have tried both constructing a generic gradient function and also a gradient evaluation at every specific input using the
gradient function from
Flux/Zygote. My question is I don’t see any advantage in evaluation time with a generic gradient function when evaluated at a new input values. Is there a more faster way to do this, where the gradient information can be reused as the architecture of the NN is fixed? Following are the related code snippets with benchmarks:
using Flux, BenchmarkTools model = Chain(Dense(1,5,relu), Dense(5,5,relu), Dense(5,1,identity)) eval_model(x) = model([x]) par = Flux.params(model) x_test = 0.5; gr_generic(x) = gradient(() -> eval_model(x), par) gr_specific = gradient(() -> eval_model(x_test), par) # After two @btime runs to take away the overheads in all the below cases julia> @btime gr_specific = gradient(() -> eval_model(x_test), par) 33.552 μs (311 allocations: 20.70 KiB) julia> @btime gr_generic(x_test) 33.836 μs (313 allocations: 21.03 KiB) # Also evaluating the generic one at a new value to see if it can make any advantage, x_test_new = 0.3 julia> @btime gr_generic(x_test_new) 33.797 μs (313 allocations: 21.03 KiB) # compared to a new gr_specific evaluation below, @btime gr_specific_new = gradient(() -> eval_model(x_test_new), par) 33.742 μs (311 allocations: 20.70 KiB)
We can see that both the above are nearly the same (actually generic one is very slightly slower, with few extra allocations), so having a generic gradient doesn’t bring any advantage here when evaluated at a new input value. May I know which one of the above is recommended? And if there is a way to reuse the gradient information so that evaluations can be faster than would be really great!
Thanks a lot!