So, I am trying to understand what impacts the speed of calculating the derivative of a function using ForwardDiff. I am evaluating the mean of several simulated paths of a stochastic variable. However, evaluating the derivative of the function calculating the mean with respect to some parameter a is 5x as slow as just evaluating the mean function it self, but from @btime I see that memory allocation is just ~2x. Thus, I am wondering what causes is to be 5x as slow? (I have read elsewhere that I should only expect about x2 slowdown).

```
a = -0.43;
@btime mean = computing_volterra_mean(a)
g(x) = ForwardDiff.derivative(computing_volterra_mean, x)
@btime deriv_mean = g(a)
```

112.625 ms (48002 allocations: 36.90 MiB)

552.525 ms (75011 allocations: 64.84 MiB)

Code:

```
using ForwardDiff
using DualNumbers
using BenchmarkTools
using DiffBase
> function b(a, k)
> return (k^(a+1.0)-(k-1.0)^(a+1.0))/(a+1.0);
> end
>
> function get_sigma_Z1(n)
> return 1.0/sqrt(n);
> end
>
> function get_sigma_Z2(a,n)
> return 1.0/sqrt(((2.0*a+1.0)*n^(2.0*a+1.0)))
> end
>
> function get_rho(a)
> return sqrt(2.0*a+1.0)/(a+1.0)
> end
>
> function gen_G(a,n,s)
> G =zeros(eltype(a),s+1);
> for i =2:s+1
> G[i] = b(a,Float64(i))/(n^a)
> end
> return G;
> end
>
> function direct_convolution(x, y, s, a)
> c = zeros(eltype(a),2*s)
> @inbounds @simd for j=1:s
> @inbounds @simd for k=1:(s+1)
> c[j+k-1] = c[j+k-1]+ x[j]*y[k]
> end
> end
> return c
> end
>
> function gen_volterra(a,s,n,G,rand_1,rand_2,rho, sigma_Z1,sigma_Z2)
> Z_1 = zeros(eltype(a),s); #1st normal
> Z_2 = zeros(eltype(a),s); # 2nd normal
>
> @inbounds @simd for i = 1:s
> Z_1[i] = sigma_Z1*rand_1[i] #set correct variance of random variate
> Z_2[i] = sigma_Z2*(rho*rand_1[i]+sqrt(1.0-rho^2.0)*rand_2[i]); #set correct variance of random variate
> end
> return [0;(direct_convolution(Z_1,G,s,a)[1:s]+Z_2)];
> end
>
> function computing_volterra_mean(a)
> num_paths =1000;
> n =500;
> s =500;
>
> G = gen_G(a,n,s); #Generate kernel
> rho = get_rho(a); # precomute rho, and stddevs
> sigma_Z1 =get_sigma_Z1(n);
> sigma_Z2 = get_sigma_Z2(a,n)
> result =zero(a);
>
> @inbounds @simd for i =1:num_paths
> rand1 =randn(s);
> rand2 =randn(s);
> result+=gen_volterra(a,s,n,G,rand1,rand2,rho,sigma_Z1,sigma_Z2)[s+1]
> end
>
> return result/num_paths;
> end
```