I am beginner in ML coming from the scientific computing field and getting started with Flux.jl
for a simple deep learning problem (not very deep actually) with an MLP Neural Network (NN). Like the usual training approach with mini-batch stochastic gradient method, I need to evaluate the gradient of the the NN with respect to parameters at several inputs in each epoch. I have tried both constructing a generic gradient function and also a gradient evaluation at every specific input using the gradient
function from Flux/Zygote
. My question is I donāt see any advantage in evaluation time with a generic gradient function when evaluated at a new input values. Is there a more faster way to do this, where the gradient information can be reused as the architecture of the NN is fixed? Following are the related code snippets with benchmarks:
using Flux, BenchmarkTools
model = Chain(Dense(1,5,relu), Dense(5,5,relu), Dense(5,1,identity))
eval_model(x) = model([x])
par = Flux.params(model)
x_test = 0.5;
gr_generic(x) = gradient(() -> eval_model(x), par)
gr_specific = gradient(() -> eval_model(x_test), par)
# After two @btime runs to take away the overheads in all the below cases
julia> @btime gr_specific = gradient(() -> eval_model(x_test), par)
33.552 Ī¼s (311 allocations: 20.70 KiB)
julia> @btime gr_generic(x_test)
33.836 Ī¼s (313 allocations: 21.03 KiB)
# Also evaluating the generic one at a new value to see if it can make any advantage,
x_test_new = 0.3
julia> @btime gr_generic(x_test_new)
33.797 Ī¼s (313 allocations: 21.03 KiB)
# compared to a new gr_specific evaluation below,
@btime gr_specific_new = gradient(() -> eval_model(x_test_new), par)
33.742 Ī¼s (311 allocations: 20.70 KiB)
We can see that both the above are nearly the same (actually generic one is very slightly slower, with few extra allocations), so having a generic gradient doesnāt bring any advantage here when evaluated at a new input value. May I know which one of the above is recommended? And if there is a way to reuse the gradient information so that evaluations can be faster than would be really great!
Thanks a lot!