I am trying to implement a statistical estimator and am having issues with both ForwardDiff and ReverseDiff taking longer to compute gradients than would simple one-sided finite differencing. Below is a minimal example with `logit_GMM`

being the function that I am trying to minimize. With 20 parameters, the ForwardDiff gradient evaluation time is more than 25x the function evaluation time (when optimally chunked), and ReverseDiff takes closer to 400x the evaluation time! By comparison, finite-differences takes less than 20x, and direct gradient computation takes even less. (My actual application is more complex, and direct gradient computation is much less feasible there.)

Am I doing something wrong, or is this problem just poorly-designed for either forward or reverse-mode automatic differentiation? Any thoughts on how to make my AD faster would be greatly appreciated. (Any other comments that might making my code more efficient are certainly welcomed as well.)

Thank you!

```
using ForwardDiff, BenchmarkTools, ReverseDiff
num_i = 1_000
choice_set_size = 30
true_b = randn(20)
num_moments = 40
X=randn(num_i*choice_set_size,length(true_b))
Îm_mat = rand(num_i*choice_set_size,num_moments)
ranges = [colon(1+choice_set_size*(ii-1), choice_set_size*ii) for ii=1:num_i]
function logit_GMM(b,X::Matrix{Float64},Îm_mat::Matrix{Float64},ranges::Vector{UnitRange{Int64}})
eu=exp.(X*b)
s=similar(eu)
@inbounds for rng â ranges
@views s[rng] .= eu[rng] ./ sum(eu[rng])
end
EÎm = (Îm_mat' * s) ./ size(ranges,1)
return EÎm' * EÎm
end
logit_GMM(b) = logit_GMM(b,X::Matrix{Float64},Îm_mat::Matrix{Float64},ranges::Vector{UnitRange{Int64}})
b=randn(size(true_b))
const f_tape = ReverseDiff.GradientTape(logit_GMM,b)
const compiled_f_tape = ReverseDiff.GradientTape(f_tape)
results = similar(b)
cfg_1 = ForwardDiff.GradientConfig(logit_GMM,b,ForwardDiff.Chunk{1}())
cfg_10 = ForwardDiff.GradientConfig(logit_GMM,b,ForwardDiff.Chunk{10}())
cfg_20 = ForwardDiff.GradientConfig(logit_GMM,b,ForwardDiff.Chunk{20}())
println("Eval-times:")
@btime logit_GMM($b,$X::Matrix{Float64},$Îm_mat::Matrix{Float64},$ranges::Vector{UnitRange{Int64}})
@btime logit_GMM($b)
println("Grad-times:")
@btime ForwardDiff.gradient!($results,$logit_GMM,$b)
@btime ForwardDiff.gradient!($results,$logit_GMM,$b,cfg_1)
@btime ForwardDiff.gradient!($results,$logit_GMM,$b,cfg_10)
@btime ForwardDiff.gradient!($results,$logit_GMM,$b,cfg_20)
@btime ReverseDiff.gradient!($results,$compiled_f_tape,$b)
```

The benchmarked times are:

```
Eval-times:
1.043 ms (1008 allocations: 751.11 KiB)
1.040 ms (1008 allocations: 751.11 KiB)
Grad-times:
32.826 ms (2018 allocations: 15.21 MiB)
126.079 ms (20160 allocations: 28.41 MiB)
32.621 ms (2016 allocations: 15.21 MiB)
26.231 ms (1009 allocations: 14.48 MiB)
384.239 ms (0 allocations: 0 bytes)
```