The computation itself is cheap enough that I’d expect it to be dominated by memory bandwidth if you calculate both the value and gradient at the same time
julia> function gradient_only(itp_img, locs)
s = itp_img(locs[1]...)*0
@inbounds for loc in locs
s += sum(Interpolations.gradient(itp_img, loc...))
end
return s
end
gradient_only (generic function with 1 method)
julia> function gradient_value(itp_img, locs)
s = itp_img(locs[1]...)*0
@inbounds for loc in locs
s += itp_img(loc...)
s += sum(Interpolations.gradient(itp_img, loc...))
end
return s
end
gradient_value (generic function with 1 method)
julia> img = rand(RGB{N0f8}, 1024, 1024);
julia> locs = collect(zip(rand(axes(img, 1), 2048), rand(axes(img, 2), 2048)));
julia> itp_img = linear_interpolation(axes(img), img);
julia> @btime gradient_only($itp_img, $locs)
69.400 μs (0 allocations: 0 bytes)
RGB{Float64}(-1.1529411764705895,-18.10980392156866,-1.8392156862744766)
julia> @btime gradient_value($itp_img, $locs)
65.600 μs (0 allocations: 0 bytes)
RGB{Float64}(1030.321568627451,1024.2039215686275,1020.3254901960775)
Strange - calculating the value and gradient was slightly faster than calculating the gradient alone! Probably a benchmarking artifact.