Ok, so I ran this code:
function testgradient(model,init)
vec = init.values.array
#vec = vec .+ 0.01 * rand(length(vec))
ld = LogDensityFunction(model)
ldg = ADgradient(:ForwardDiff,ld)
(ldval,ldgrad) = LogDensityProblems.logdensity_and_gradient(ldg,vec)
@show (ldval,ldgrad)
println("testing gradient at initial condition: ")
@show init.values.array
for i in 1:length(vec)
dx = 1e-7
dvec = copy(vec)
dvec[i] = dvec[i] + dx
lddy = LogDensityProblems.logdensity(ldg,dvec)
gradi = (lddy - ldval)/dx
@printf("gradient estimate for dimension %d is %.3f , ADest = %.3f, relerr = %.3f\n",i,gradi,ldgrad[i],(gradi - ldgrad[i])/ldgrad[i])
end
println("testing at perturbed condition: ")
vec = init.values.array .+ .01 * rand(length(vec))
for i in 1:length(vec)
dx = 1e-7
dvec = copy(vec)
dvec[i] = dvec[i] + dx
lddy = LogDensityProblems.logdensity(ldg,dvec)
gradi = (lddy - ldval)/dx
@printf("gradient estimate for dimension %d is %.3f , ADest = %.3f, relerr = %.3f\n",i,gradi,ldgrad[i],(gradi - ldgrad[i])/ldgrad[i])
end
end
and at the optimum found by BOBYQA I get
gradient estimate for dimension 1 is 17704.084 , ADest = NaN, relerr = NaN
gradient estimate for dimension 2 is 17875.809 , ADest = NaN, relerr = NaN
gradient estimate for dimension 3 is 21775.342 , ADest = NaN, relerr = NaN
gradient estimate for dimension 4 is 6974.457 , ADest = NaN, relerr = NaN
gradient estimate for dimension 5 is -229.566 , ADest = NaN, relerr = NaN
gradient estimate for dimension 6 is -6428.048 , ADest = NaN, relerr = NaN
gradient estimate for dimension 7 is 34741.851 , ADest = NaN, relerr = NaN
gradient estimate for dimension 8 is 18674.355 , ADest = NaN, relerr = NaN
gradient estimate for dimension 9 is 26003.116 , ADest = NaN, relerr = NaN
gradient estimate for dimension 10 is 26944.482 , ADest = NaN, relerr = NaN
gradient estimate for dimension 11 is 10235.465 , ADest = NaN, relerr = NaN
Then at the perturbed value where I add a random small perturbation to everything (second part of this function)
testing at perturbed condition:
gradient estimate for dimension 1 is 17297751523.070 , ADest = NaN, relerr = NaN
gradient estimate for dimension 2 is 17297750512.030 , ADest = NaN, relerr = NaN
gradient estimate for dimension 3 is 17297754910.220 , ADest = NaN, relerr = NaN
gradient estimate for dimension 4 is 17297738140.167 , ADest = NaN, relerr = NaN
gradient estimate for dimension 5 is 17297733625.846 , ADest = NaN, relerr = NaN
gradient estimate for dimension 6 is 17297727349.401 , ADest = NaN, relerr = NaN
gradient estimate for dimension 7 is 17297768334.215 , ADest = NaN, relerr = NaN
gradient estimate for dimension 8 is 17297750531.370 , ADest = NaN, relerr = NaN
gradient estimate for dimension 9 is 17297758827.056 , ADest = NaN, relerr = NaN
gradient estimate for dimension 10 is 17297756425.504 , ADest = NaN, relerr = NaN
gradient estimate for dimension 11 is 17297743769.864 , ADest = NaN, relerr = NaN
gradient estimate for dimension 12 is 17297735312.730 , ADest = NaN, relerr = NaN
gradient estimate for dimension 13 is 17297731805.592 , ADest = NaN, relerr = NaN
umm… wow?
Then I tried changing the code to use a normal perturbation even smaller…
println("testing at perturbed condition: ")
vec = init.values.array .+ rand(Normal(0.0,.001),length(vec))
testing at perturbed condition:
gradient estimate for dimension 1 is -28645839.783 , ADest = NaN, relerr = NaN
gradient estimate for dimension 2 is -28645392.270 , ADest = NaN, relerr = NaN
gradient estimate for dimension 3 is -28641709.860 , ADest = NaN, relerr = NaN
gradient estimate for dimension 4 is -28656684.783 , ADest = NaN, relerr = NaN
gradient estimate for dimension 5 is -28663605.792 , ADest = NaN, relerr = NaN
gradient estimate for dimension 6 is -28669807.431 , ADest = NaN, relerr = NaN
gradient estimate for dimension 7 is -28628833.286 , ADest = NaN, relerr = NaN
So, the gradients exist, but AD doesn’t calculate them. so no surprise that gradient based methods don’t work.
Why doesn’t ForwardDiff calculate gradients? What can cause that to go wrong?
I decided to try out AutoReverseDiff(false), with the normal perturbation
gradient estimate for dimension 1 is 17704.084 , ADest = 17704.086, relerr = -0.000
gradient estimate for dimension 2 is 17875.809 , ADest = 17875.812, relerr = -0.000
gradient estimate for dimension 3 is 21775.342 , ADest = 21775.345, relerr = -0.000
gradient estimate for dimension 4 is 6974.457 , ADest = 6974.464, relerr = -0.000
gradient estimate for dimension 5 is -229.566 , ADest = -229.564, relerr = 0.000
gradient estimate for dimension 6 is -6428.048 , ADest = -6428.046, relerr = 0.000
gradient estimate for dimension 7 is 34741.851 , ADest = 34741.855, relerr = -0.000
gradient estimate for dimension 8 is 18674.355 , ADest = 18674.362, relerr = -0.000
gradient estimate for dimension 9 is 26003.116 , ADest = 26003.124, relerr = -0.000
gradient estimate for dimension 10 is 26944.482 , ADest = 26944.499, relerr = -0.000
gradient estimate for dimension 11 is 10235.465 , ADest = 10235.470, relerr = -0.000
gradient estimate for dimension 12 is 1808.442 , ADest = 1808.446, relerr = -0.000
gradient estimate for dimension 13 is -2605.592 , ADest = -2605.592, relerr = 0.000
gradient estimate for dimension 14 is -817.344 , ADest = -817.344, relerr = 0.000
gradient estimate for dimension 15 is -1745.760 , ADest = -1745.760, relerr = 0.000
gradient estimate for dimension 16 is -1324.381 , ADest = -1324.381, relerr = -0.000
gradient estimate for dimension 17 is -706.838 , ADest = -706.838, relerr = 0.000
gradient estimate for dimension 18 is -211.529 , ADest = -211.529, relerr = 0.000
gradient estimate for dimension 19 is 44712.491 , ADest = 44712.499, relerr = -0.000
gradient estimate for dimension 20 is 42743.151 , ADest = 42743.166, relerr = -0.000
gradient estimate for dimension 21 is 50166.946 , ADest = 50166.962, relerr = -0.000
gradient estimate for dimension 22 is 35201.962 , ADest = 35201.991, relerr = -0.000
gradient estimate for dimension 23 is 6663.269 , ADest = 6663.279, relerr = -0.000
gradient estimate for dimension 24 is -7842.856 , ADest = -7842.848, relerr = 0.000
gradient estimate for dimension 25 is -389.426 , ADest = -389.425, relerr = 0.000
gradient estimate for dimension 26 is 123.694 , ADest = 123.694, relerr = 0.000
gradient estimate for dimension 27 is -428.161 , ADest = -428.161, relerr = 0.000
gradient estimate for dimension 28 is 1898.894 , ADest = 1898.895, relerr = -0.000
gradient estimate for dimension 29 is 456.530 , ADest = 456.531, relerr = -0.000
gradient estimate for dimension 30 is -1991.917 , ADest = -1991.917, relerr = 0.000
gradient estimate for dimension 31 is -378.437 , ADest = -378.437, relerr = -0.000
gradient estimate for dimension 32 is -304.205 , ADest = -304.205, relerr = 0.000
gradient estimate for dimension 33 is -0.038 , ADest = NaN, relerr = NaN
gradient estimate for dimension 34 is -54.432 , ADest = NaN, relerr = NaN
gradient estimate for dimension 35 is 67.982 , ADest = NaN, relerr = NaN
gradient estimate for dimension 36 is -46.776 , ADest = NaN, relerr = NaN
gradient estimate for dimension 37 is 16.698 , ADest = NaN, relerr = NaN
gradient estimate for dimension 38 is 18.293 , ADest = NaN, relerr = NaN
gradient estimate for dimension 39 is 76.991 , ADest = NaN, relerr = NaN
gradient estimate for dimension 40 is -24.460 , ADest = NaN, relerr = NaN
gradient estimate for dimension 41 is 50.048 , ADest = NaN, relerr = NaN
gradient estimate for dimension 42 is -245.711 , ADest = NaN, relerr = NaN
gradient estimate for dimension 43 is 101.773 , ADest = NaN, relerr = NaN
So it calculates SOME of the gradient dimensions but not others.
How about at the perturbed location? There are MASSIVE relative errors between finite difference and AutoReverseDiff, but it still has NaN for the 33 dimension onwards
testing at perturbed condition:
gradient estimate for dimension 1 is -438338326.307 , ADest = 17704.086, relerr = -24760.162
gradient estimate for dimension 2 is -438338283.146 , ADest = 17875.812, relerr = -24522.307
gradient estimate for dimension 3 is -438334189.793 , ADest = 21775.345, relerr = -20130.839
gradient estimate for dimension 4 is -438349504.072 , ADest = 6974.464, relerr = -62851.636
gradient estimate for dimension 5 is -438356333.853 , ADest = -229.564, relerr = 1909511.965
gradient estimate for dimension 6 is -438362551.859 , ADest = -6428.046, relerr = 68194.309
gradient estimate for dimension 7 is -438321282.970 , ADest = 34741.855, relerr = -12617.519
gradient estimate for dimension 8 is -438337508.705 , ADest = 18674.362, relerr = -23473.690
gradient estimate for dimension 9 is -438329918.286 , ADest = 26003.124, relerr = -16857.818
gradient estimate for dimension 10 is -438329783.466 , ADest = 26944.499, relerr = -16268.876
gradient estimate for dimension 11 is -438345883.202 , ADest = 10235.470, relerr = -42827.161
gradient estimate for dimension 12 is -438354309.549 , ADest = 1808.446, relerr = -242393.802
gradient estimate for dimension 13 is -438358719.526 , ADest = -2605.592, relerr = 168236.655
gradient estimate for dimension 14 is -438356918.963 , ADest = -817.344, relerr = 536317.929
gradient estimate for dimension 15 is -438357869.928 , ADest = -1745.760, relerr = 251097.580
gradient estimate for dimension 16 is -438357376.581 , ADest = -1324.381, relerr = 330989.418
gradient estimate for dimension 17 is -438356813.879 , ADest = -706.838, relerr = 620164.968
gradient estimate for dimension 18 is -438356319.889 , ADest = -211.529, relerr = 2072324.635
gradient estimate for dimension 19 is -438311174.549 , ADest = 44712.499, relerr = -9803.878
gradient estimate for dimension 20 is -438313499.289 , ADest = 42743.166, relerr = -10255.587
gradient estimate for dimension 21 is -438305534.833 , ADest = 50166.962, relerr = -8737.936
These coefficients from 33 onward are used to construct ApproxFuns which are used to create smooth Chebyshev polynomials in space to affect the predictions based on the spatial location of the flows we’re estimating.
So apparently you can’t differentiate through Chebyshev polynomials? It seems weird, because the Polynomials are smooth functions of the coefficients.
EDIT:
At its core ApproxFuns don’t seem to be a major problem for ForwardDiff?
using ApproxFun,ForwardDiff
function f(v)
(a,b,c,x) = v
G = Fun(Chebyshev(),[a,b,c])
G(x)
end
x = [1.0,1.0,2.0,0.5]
f(x)
ForwardDiff.gradient(f,x)
Gives what looks like sensible answers.