While implementing an on-line optimization method (with a for loop to access some new data), I am using HesVec()
and JacVec()
of SparseDiffTools.jl to compute sparse Hessian and Jacobian matrices with option autodiff=false
for each. Inside the same loop, I have to compute gradients this time with sparse(ForwardDiff.gradient(objectivefn, somevector))
. Each of these autodiff tools has to compute the objective value each time it is called (I guess). I do not get any error with HesVec()
and JacVec()
(I suppose everything is fine!); but after the first step (first data), ForwardDiff.gradient()
throws a conversion error in the line where Flux.Losses.logitcrossentropy(yhat, yi)
is called in the loss function (apparently, trying to do some ForwardDiff.Dual
to AbstractFloat
conversion). One guess is that there may be some NaN values somewhere from the computation of the gradients that Julia identifies as #unused# (I don’t know), as shown in the error text below. The problem is I find it difficult to properly trace this error to know the exact cause or to know maybe ForwardDiff.gradient()
is unable to handle sparsity or largeness of values of the input vector after the first update?
MethodError: no method matching Float64(::ForwardDiff.Dual{ForwardDiff.Tag{Main.ModuleA.var"#4#6"{Matrix{Float32}, Main.ModuleA.var"#3#5"{Flux.var"#64#66"{Vector{AbstractArray{Float32}}}, Int64}, Flux.var"#64#66"{Vector{AbstractArray{Float32}}}, Int64}, Float32}, Float64, 12})
Closest candidates are:
(::Type{T})(::Real, ::RoundingMode) where T<:AbstractFloat at C:\Users\user\AppData\Local\Programs\Julia-1.7.1\share\julia\base\rounding.jl:200
(::Type{T})(::T) where T<:Number at C:\Users\user\AppData\Local\Programs\Julia-1.7.1\share\julia\base\boot.jl:770
(::Type{T})(::AbstractChar) where T<:Union{AbstractChar, Number} at C:\Users\user\AppData\Local\Programs\Julia-1.7.1\share\julia\base\char.jl:50
...
Stacktrace:
[1] convert(#unused#::Type{Float64}, x::ForwardDiff.Dual{ForwardDiff.Tag{Main.ModuleA.var"#4#6"{Matrix{Float32}, Main.ModuleA.var"#3#5"{Flux.var"#64#66"{Vector{AbstractArray{Float32}}}, Int64}, Flux.var"#64#66"{Vector{AbstractArray{Float32}}}, Int64}, Float32}, Float64, 12})
@ Base .\number.jl:7
On the other hand, I would like to know if there is a way to “record operations for autodiff” as in tensorflow’s GradientTape, where I could call my objective function only once and use its value to compute the various derivatives with respect to a “watched” variable? With this, I think it would be easier to maybe trace this error. As I suspect Julia autodiff could be manipulating the input vector types somehow.
Thank you.