Flux differentiation error

Honza9723 · November 8, 2020, 11:52pm

Dear All,

I am working on a solution of simple differential equation y(x) - y’(x) = 0 using neural network. I am using a two-layer softplus network for parameterization of y(x). To get y’(x), I am using simple Flux.gradient. Gradient itself works well (d𝛰.(xGrid) generate derivative at all grid points for example), however, when I put it into the loss function and run Flux.train! loop, I get following error

" Can’t differentiate foreigncall expression"

Could somebody please give me some guidance how to fix this problem?

Best,
Honza

#(B) Solve simple differential equation
#(1) Generate grid
xGrid = sort(rand(Uniform(-1,1),1,ϰ),dims=1)

#(2) Build neural network and its derivative
Ο = Flux.Chain(Dense(1,16,softplus),Dense(16,1,softplus))
𝛰(x) = Ο([x])[1]
d𝛰(x) = Flux.gradient(𝛰,x)[1]


#(3) Build loss function
function 𝕰(x)
    𝕰 = sum((𝛰.(x) .- d𝛰.(x)).^2)
    𝕭 = (𝛰(1)-1).^2
    𝕷 = 𝕰+𝕭
    return 𝕷
end

𝜣 = Flux.params(Ο)
Data = [xGrid]
opt = ADAM(0.13)

cb = () -> println(𝕰(xGrid))
@time Flux.@epochs 5000 Flux.train!(𝕰,𝜣,Data,opt,cb=cb)

Honza9723 · November 9, 2020, 12:54am

To be more specific, I would like to ask, how to take a derivative of a neural network, such that Flux is able to differentiate it again.

Edit: I tried to get around this problem by computing the derivative using ForwardDiff. I computed the first derivative of neural network using ForwardDiff, and then computed second derivative using Flux.gradient on the ForwardDiff derivative as a trial. It worked without any problems. However, when I plugged the ForwardDiff derivative into the loss function (loss function itself works) and tried to optimize it using Flux.train! loop, I get following error:

TypeError: in typeassert, expected Float32, got ForwardDiff.Dual{Nothing,Float32,1}
in top-level scope at base\util.jl:175
in macro expansion at Juno\n6wyj\src\progress.jl:119
in macro expansion at Flux\Fj3bt\src\optimise\train.jl:122
in  at Flux\Fj3bt\src\optimise\train.jl:79
in #train!#12 at Flux\Fj3bt\src\optimise\train.jl:81
in macro expansion at Juno\n6wyj\src\progress.jl:119 
in macro expansion at Flux\Fj3bt\src\optimise\train.jl:92 
in update! at Flux\Fj3bt\src\optimise\train.jl:31
in update! at Flux\Fj3bt\src\optimise\train.jl:25
in apply! at Flux\Fj3bt\src\optimise\optimisers.jl:175
in materialize! at base\broadcast.jl:823
in copyto! at base\broadcast.jl:864 
in copyto! at base\broadcast.jl:909 
in macro expansion at base\simdloop.jl:77 
in macro expansion at base\broadcast.jl:910 
in setindex! at base\multidimensional.jl:545 
in setindex! at base\array.jl:828

Code

#(B) Solve simple differential equation
#(1) Generate grid
xGrid = sort(rand(Uniform(-1,1),1,ϰ),dims=1)

#(2) Build neural network and its derivative
Ο = Flux.Chain(Dense(1,16,softplus),Dense(16,1,softplus))
ο(t) = Ο([t])[1]
dο(t) = ForwardDiff.derivative(ο,t)[1]
ddο(t) = Flux.gradient(dο,t)[1]

#(3) Build loss function
function 𝕰(x)
    𝕰 = sum((ο.(x) .- dο.(x)).^2)
    𝕭 = (ο(1)-1).^2
    𝕷 = 𝕰+𝕭
    return 𝕷
end

𝕰(xGrid)


𝜣 = Flux.params(Ο)
Data = [xGrid]
opt = ADAM(0.13)

cb = () -> println(𝕰(xGrid))
@time Flux.@epochs 5000 Flux.train!(𝕰,𝜣,Data,opt,cb=cb)

Edit 2: I made ddο(t) work using fix suggested by @ChrisRackauckas
https://github.com/SciML/DiffEqFlux.jl/blob/v1.7.0/src/DiffEqFlux.jl#L53-L64

# ForwardDiff integration

ZygoteRules.@adjoint function ForwardDiff.Dual{T}(x, ẋ::Tuple) where T
  @assert length(ẋ) == 1
  ForwardDiff.Dual{T}(x, ẋ), ḋ -> (ḋ.partials[1], (ḋ.value,))
end

ZygoteRules.@adjoint ZygoteRules.literal_getproperty(d::ForwardDiff.Dual{T}, ::Val{:partials}) where T =
  d.partials, ṗ -> (ForwardDiff.Dual{T}(ṗ[1], 0),)

ZygoteRules.@adjoint ZygoteRules.literal_getproperty(d::ForwardDiff.Dual{T}, ::Val{:value}) where T =
  d.value, ẋ -> (ForwardDiff.Dual{T}(0, ẋ),)

However, I can’t still get gradient of LossFunction, I am getting following error:
MethodError: no method matching Float32(::ForwardDiff.Dual{ForwardDiff.Tag{typeof(ο),Float64},Float64,1})

Does somebody have an idea, how to fix it (use ForwardDiff/other autodiff in loss function, and still be able to train it in Flux)?

ChrisRackauckas · November 9, 2020, 2:07pm

You’re mixing Float64 and Float32: you might want to make everything Float32.

Honza9723 · November 9, 2020, 2:10pm

Dear Chris, thank you. So, I should convert xGrid to Float32? Is it a problem of input (xGrid), or do I need to fix something inside loss function?

ChrisRackauckas · November 9, 2020, 2:12pm

Yes, you probably want to convert what comes out of rand to a Float32 (or make it directly sample Float32s).

Honza9723 · November 9, 2020, 2:14pm

I tried that, and I still get this error

MethodError: no method matching Float32(::ForwardDiff.Dual{ForwardDiff.Tag{typeof(ο),Float32},Float64,1})

Code

#(B) Solve simple differential equation
#(1) Generate grid
xGrid = sort(rand(Uniform(-1,1),1,ϰ),dims=1)

xGrid = convert(Array{Float32},xGrid)

#(2) Build neural network and its derivative
Ο = Flux.Chain(Dense(1,16,softplus),Dense(16,1,softplus))
ο(t) = Ο([t])[1]
dο(t) = ForwardDiff.derivative(ο,t)
ddο(t) = Flux.gradient(dο,t)[1]

dο(5)

ddο(5)

ddο.(xGrid)

#(3) Build loss function
function 𝕰(x)
    𝕰 = sum((ο.(x) .- dο.(x)).^2)
    𝕭 = (ο(1)-1).^2
    𝕷 = 𝕰+𝕭
    return 𝕷
end

𝕰(xGrid)

𝜣 = Flux.params(Ο)
Data = [xGrid]
opt = ADAM(0.13)

cb = () -> println(𝕰(xGrid))
@time Flux.@epochs 5000 Flux.train!(𝕰,𝜣,Data,opt,cb=cb)

Honza9723 · November 9, 2020, 5:19pm

Hi, sorry for spamming with this problem. I differentiated my loss function with respect to parameters of the neural network and get the following gradient. So, Zygote was able to differentiate ForwardDiff stuff.

∇𝕰 = Flux.gradient(()->𝕰(xGrid),𝜣)
∇𝕰.grads

However, the gradient had the following structure (include dual numbers, hence update of parameters fails in Flux.train! loop). It looks like the fix suggested by @ChrisRackauckas doesn’t work. Any idea, how to fix this?

IdDict{Any, Any} with 5 entries
16×1 Array{Float32,2}:
 => 16×1 Array{ForwardDiff.Dual{ForwardDiff.Tag{typeof(ο),Float32},Float32,1},2}:
 Dual{ForwardDiff.Tag{typeof(ο),Float32}}(-0.2634682,1271.0961)
 Dual{ForwardDiff.Tag{typeof(ο),Float32}}(0.036139756,-196.31708)
 Dual{ForwardDiff.Tag{typeof(ο),Float32}}(0.2237638,-1756.2749)
 Dual{ForwardDiff.Tag{typeof(ο),Float32}}(0.14364085,-776.0406)
 Dual{ForwardDiff.Tag{typeof(ο),Float32}}(0.0034865336,-18.079405)
 Dual{ForwardDiff.Tag{typeof(ο),Float32}}(0.27216268,-1725.1829)
 Dual{ForwardDiff.Tag{typeof(ο),Float32}}(-0.07260824,479.56946)
 Dual{ForwardDiff.Tag{typeof(ο),Float32}}(0.020844292,-140.64598)
 Dual{ForwardDiff.Tag{typeof(ο),Float32}}(-0.23060206,1787.3832)
 Dual{ForwardDiff.Tag{typeof(ο),Float32}}(0.17264761,-1272.322)
 Dual{ForwardDiff.Tag{typeof(ο),Float32}}(0.33930257,-1820.1713)
 Dual{ForwardDiff.Tag{typeof(ο),Float32}}(-0.056477264,478.3624)
 Dual{ForwardDiff.Tag{typeof(ο),Float32}}(0.3520169,-1696.228)
 Dual{ForwardDiff.Tag{typeof(ο),Float32}}(-0.039859835,272.59702)
 Dual{ForwardDiff.Tag{typeof(ο),Float32}}(0.19189215,-1461.739)
 Dual{ForwardDiff.Tag{typeof(ο),Float32}}(-0.08595074,513.59863)
Vector{Float32} with 16 elements
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
0.00
 => 
Vector{ForwardDiff.Dual{ForwardDiff.Tag{typeof(ο),Float32},Float32,1}} with 16 elements
Dual{ForwardDiff.Tag{typeof(ο),Float32}}(-0.26234788,357.27512)
Dual{ForwardDiff.Tag{typeof(ο),Float32}}(0.03616306,-26.888998)
Dual{ForwardDiff.Tag{typeof(ο),Float32}}(0.22294879,373.919)
Dual{ForwardDiff.Tag{typeof(ο),Float32}}(0.14253888,-111.39714)
Dual{ForwardDiff.Tag{typeof(ο),Float32}}(0.0034794048,-3.4679422)
Dual{ForwardDiff.Tag{typeof(ο),Float32}}(0.27468497,44.10873)
Dual{ForwardDiff.Tag{typeof(ο),Float32}}(-0.07365812,-31.0434)
Dual{ForwardDiff.Tag{typeof(ο),Float32}}(0.020888403,11.870704)
Dual{ForwardDiff.Tag{typeof(ο),Float32}}(-0.23005879,-363.95084)
Dual{ForwardDiff.Tag{typeof(ο),Float32}}(0.17167568,205.64714)
Dual{ForwardDiff.Tag{typeof(ο),Float32}}(0.33812273,-275.50446)
Dual{ForwardDiff.Tag{typeof(ο),Float32}}(-0.05754064,-129.68915)
Dual{ForwardDiff.Tag{typeof(ο),Float32}}(0.35195476,-477.36105)
Dual{ForwardDiff.Tag{typeof(ο),Float32}}(-0.04028002,-26.315512)
Dual{ForwardDiff.Tag{typeof(ο),Float32}}(0.19163737,276.66525)
Dual{ForwardDiff.Tag{typeof(ο),Float32}}(-0.08639604,17.339998)
1×16 Array{Float32,2}:
 -0.407974  0.0630915  0.566018  0.249386  …  -0.0877669  0.470995  -0.165193
 => 1×16 Array{ForwardDiff.Dual{ForwardDiff.Tag{typeof(ο),Float32},Float32,1},2}:
Vector{Float32} with 1 element
0.00
 => 
Vector{ForwardDiff.Dual{ForwardDiff.Tag{typeof(ο),Float32},Float32,1}} with 1 element
Dual{ForwardDiff.Tag{typeof(ο),Float32}}(1.009891,-46.851746)
:(Main.xGrid) => 1×2500 Array{Float32,2}:

Do I understand it correctly, that it computed gradient correctly, and I need just to “de-dualize” it? Is there some easy way, how to do it, ideally such that Flux.train! loop would work?

Honza9723 · November 10, 2020, 4:41pm

Hi, any guidance on what to do with this problem? Sorry for spamming with this, but I desperately need advice. I think it should be something simple, but I can’t find the solution. @ChrisRackauckas @MikeInnes

ChrisRackauckas · November 12, 2020, 1:50pm

Yes, GalacticOptim.jl actually hard codes the workaround:

github.com

SciML/Optimization.jl/blob/master/src/function.jl#L174


      
          
          function symbolify(e::Expr)
              if !(e.args[1] isa Symbol)
                  e.args[1] = Symbol(e.args[1])
              end
              symbolify.(e.args)
              return e
          end
          
          function symbolify(e)
              return e
          end
          
          function rep_pars_vals!(e::Expr, p)
              rep_pars_vals!.(e.args, Ref(p))
              replace!(e.args, p...)
          end
          
          function rep_pars_vals!(e, p) end
          
          """
          instantiate_function(f, x, ::AbstractADType, p, num_cons = 0)::OptimizationFunction
          
          This function is used internally by Optimization.jl to construct
          the necessary extra functions (gradients, Hessians, etc.) before
          optimization. Each of the ADType dispatches use the supplied automatic
          differentiation type in order to specify how the construction process
          occurs.
          
          If no ADType is given, then the default `NoAD` dispatch simply
          defines closures on any supplied gradient function to enclose the
          parameters to match the interfaces for the specific optimization
          libraries (i.e. (G,x)->f.grad(G,x,p)). If a function is not given
          and the `NoAD` dispatch is used, or if the AD dispatch is currently
          not capable of defining said derivative, then the constructed
          `OptimizationFunction` will simply use `nothing` to specify and undefined
          function.
          
          The return of `instantiate_function` is an `OptimizationFunction` which
          is then used in the optimization process. If an optimizer requires a
          function that is not defined, an error is thrown.
          
          For more information on the use of automatic differentiation, see the
          documentation of the `AbstractADType` types.
          """
          function instantiate_function(f, x, ::SciMLBase.NoAD,
              p, num_cons = 0)
              grad = f.grad === nothing ? nothing : (G, x, args...) -> f.grad(G, x, p, args...)
              hess = f.hess === nothing ? nothing : (H, x, args...) -> f.hess(H, x, p, args...)
              hv = f.hv === nothing ? nothing : (H, x, v, args...) -> f.hv(H, x, v, p, args...)
              cons = f.cons === nothing ? nothing : (res, x) -> f.cons(res, x, p)
              cons_j = f.cons_j === nothing ? nothing : (res, x) -> f.cons_j(res, x, p)
              cons_h = f.cons_h === nothing ? nothing : (res, x) -> f.cons_h(res, x, p)
              hess_prototype = f.hess_prototype === nothing ? nothing :
                               convert.(eltype(x), f.hess_prototype)
              cons_jac_prototype = f.cons_jac_prototype === nothing ? nothing :
                                   convert.(eltype(x), f.cons_jac_prototype)
              cons_hess_prototype = f.cons_hess_prototype === nothing ? nothing :
                                    [convert.(eltype(x), f.cons_hess_prototype[i])
                                     for i in 1:num_cons]
              expr = symbolify(f.expr)
              cons_expr = symbolify.(f.cons_expr)
          
              return OptimizationFunction{true}(f.f, SciMLBase.NoAD(); grad = grad, hess = hess,
                  hv = hv,
                  cons = cons, cons_j = cons_j, cons_h = cons_h,
                  hess_prototype = hess_prototype,
                  cons_jac_prototype = cons_jac_prototype,
                  cons_hess_prototype = cons_hess_prototype,
                  expr = expr, cons_expr = cons_expr,
                  syms = f.syms, paramsyms = f.paramsyms,
                  observed = f.observed)
          end
          
          function instantiate_function(f, cache::ReInitCache, ::SciMLBase.NoAD,
              num_cons = 0)
              grad = f.grad === nothing ? nothing : (G, x, args...) -> f.grad(G, x, cache.p, args...)
              hess = f.hess === nothing ? nothing : (H, x, args...) -> f.hess(H, x, cache.p, args...)
              hv = f.hv === nothing ? nothing : (H, x, v, args...) -> f.hv(H, x, v, cache.p, args...)
              cons = f.cons === nothing ? nothing : (res, x) -> f.cons(res, x, cache.p)
              cons_j = f.cons_j === nothing ? nothing : (res, x) -> f.cons_j(res, x, cache.p)
              cons_h = f.cons_h === nothing ? nothing : (res, x) -> f.cons_h(res, x, cache.p)
              hess_prototype = f.hess_prototype === nothing ? nothing :
                               convert.(eltype(cache.u0), f.hess_prototype)
              cons_jac_prototype = f.cons_jac_prototype === nothing ? nothing :
                                   convert.(eltype(cache.u0), f.cons_jac_prototype)
              cons_hess_prototype = f.cons_hess_prototype === nothing ? nothing :
                                    [convert.(eltype(cache.u0), f.cons_hess_prototype[i])
                                     for i in 1:num_cons]
              expr = symbolify(f.expr)
              cons_expr = symbolify.(f.cons_expr)
          
              return OptimizationFunction{true}(f.f, SciMLBase.NoAD(); grad = grad, hess = hess,
                  hv = hv,
                  cons = cons, cons_j = cons_j, cons_h = cons_h,
                  hess_prototype = hess_prototype,
                  cons_jac_prototype = cons_jac_prototype,
                  cons_hess_prototype = cons_hess_prototype,
                  expr = expr, cons_expr = cons_expr,
                  syms = f.syms, paramsyms = f.paramsyms,
                  observed = f.observed)
          end
          
          function instantiate_function(f, x, adtype::ADTypes.AbstractADType,
              p, num_cons = 0)
              adtypestr = string(adtype)
              _strtind = findfirst('.', adtypestr)
              strtind = isnothing(_strtind) ? 5 : _strtind + 5
              open_nrmlbrkt_ind = findfirst('(', adtypestr)
              open_squigllybrkt_ind = findfirst('{', adtypestr)
              open_brkt_ind = isnothing(open_squigllybrkt_ind) ? open_nrmlbrkt_ind :
                              min(open_nrmlbrkt_ind, open_squigllybrkt_ind)
              adpkg = adtypestr[strtind:(open_brkt_ind - 1)]
              throw(ArgumentError("The passed automatic differentiation backend choice is not available. Please load the corresponding AD package $adpkg."))
          end

This file has been truncated. show original

Honza9723 · November 12, 2020, 2:07pm

@ChrisRackauckas Thank you very much! I did something like that on my own, just very inefficient. So, I just need to plug my loss function into GalacticOptim?

ChrisRackauckas · November 13, 2020, 2:29pm

I think so. . If it doesn’t, let me know.

Honza9723 · November 13, 2020, 3:19pm

@ChrisRackauckas Thank you very much! I will try it! Is there some example, how to put user-defined loss function with Flux neural networks inside to GalacticOptim.jl?

PS: I make it work using my implementation of ADAM that manually de-dualize gradient. Thank you for your guidance!

ChrisRackauckas · November 14, 2020, 1:00am

No problem!

ChrisRackauckas · November 14, 2020, 6:41pm

And for the future, see this update on where our AD tools are going: DifferentialEquations - Derivatives in ODE function/ nesting AD - #2 by ChrisRackauckas

Honza9723 · November 18, 2020, 4:48pm

Hi Chris, I tried to install GalacticOptim.jl, and I get the following error. Is there some tutorial on how to solve this?

Also, I would like to ask, how to optimize loss function which contains Flux neural network with those implicit parameters. Simply collect them using Flux.params?

ERROR: Unsatisfiable requirements detected for package Compat [34da2185]:
 Compat [34da2185] log:
 ├─possible versions are: [1.0.0-1.0.1, 1.1.0, 1.2.0, 1.3.0, 1.4.0, 1.5.0-1.5.1, 2.0.0, 2.1.0, 2.2.0-2.2.1, 3.0.0, 3.1.0, 3.2.0, 3.3.0-3.3.1, 3.4.0, 3.5.0, 3.6.0, 3.7.0, 3.8.0, 3.9.0-3.9.1, 3.10.0, 3.11.0, 3.12.0, 3.13.0, 3.14.0, 3.15.0, 3.16.0, 3.17.0, 3.18.0, 3.19.0, 3.20.0, 3.21.0, 3.22.0, 3.23.0] or uninstalled
 ├─restricted by compatibility requirements with BlackBoxOptim [a134a8b2] to versions: [1.0.0-1.0.1, 1.1.0, 1.2.0, 1.3.0, 1.4.0, 1.5.0-1.5.1, 2.0.0, 2.1.0, 2.2.0-2.2.1, 3.0.0, 3.1.0, 3.2.0, 3.3.0-3.3.1, 3.4.0, 3.5.0, 3.6.0, 3.7.0, 3.8.0, 3.9.0-3.9.1, 3.10.0, 3.11.0, 3.12.0, 3.13.0, 3.14.0, 3.15.0, 3.16.0, 3.17.0, 3.18.0, 3.19.0, 3.20.0, 3.21.0, 3.22.0, 3.23.0]
 │ └─BlackBoxOptim [a134a8b2] log:
 │   ├─possible versions are: [0.4.0, 0.5.0] or uninstalled
 │   └─restricted to versions * by an explicit requirement, leaving only versions [0.4.0, 0.5.0]
 ├─restricted by compatibility requirements with TensorFlow [1d978283] to versions: [1.0.0-1.0.1, 1.1.0, 1.2.0, 1.3.0, 1.4.0, 1.5.0-1.5.1, 2.0.0, 2.1.0, 2.2.0-2.2.1]
 │ └─TensorFlow [1d978283] log:
 │   ├─possible versions are: [0.10.2, 0.10.4, 0.11.0] or uninstalled
 │   └─restricted to versions * by an explicit requirement, leaving only versions [0.10.2, 0.10.4, 0.11.0]   
 └─restricted by compatibility requirements with Optim [429524aa] to versions: [3.2.0, 3.3.0-3.3.1, 3.4.0, 3.5.0, 3.6.0, 3.7.0, 3.8.0, 3.9.0-3.9.1, 3.10.0, 3.11.0, 3.12.0, 3.13.0, 3.14.0, 3.15.0, 3.16.0, 3.17.0, 3.18.0, 3.19.0, 3.20.0, 3.21.0, 3.22.0, 3.23.0] — no versions left
   └─Optim [429524aa] log:
     ├─possible versions are: [0.15.3, 0.16.0, 0.17.0-0.17.2, 0.18.0-0.18.1, 0.19.0-0.19.7, 0.20.0-0.20.6, 0.21.0, 0.22.0, 1.0.0, 1.1.0, 1.2.0] or uninstalled
     ├─restricted to versions * by an explicit requirement, leaving only versions [0.15.3, 0.16.0, 0.17.0-0.17.2, 0.18.0-0.18.1, 0.19.0-0.19.7, 0.20.0-0.20.6, 0.21.0, 0.22.0, 1.0.0, 1.1.0, 1.2.0]
     └─restricted by compatibility requirements with GalacticOptim [a75be94c] to versions: [0.22.0, 1.0.0, 1.1.0, 1.2.0]
       └─GalacticOptim [a75be94c] log:
         ├─possible versions are: [0.1.0-0.1.3, 0.2.0-0.2.2, 0.3.0-0.3.1, 0.4.0-0.4.1] or uninstalled        
         └─restricted to versions * by an explicit requirement, leaving only versions [0.1.0-0.1.3, 0.2.0-0.2.2, 0.3.0-0.3.1, 0.4.0-0.4.1]

ChrisRackauckas · November 18, 2020, 5:33pm

I think TensorFlow.jl might be implicitly upper bounding compat.jl? @oxinabox

Though this is an entirely diferent topic so it shouldn’t be in the same thread.

oxinabox · November 18, 2020, 8:19pm

Having TensorFlow.jl in your dependency tree seems like a mistake.
Its pretty stale these days. Like it works, but it is bound to an old version of LibTensorFlow, and its just less fun to use than Flux etc.

Honza9723 · November 18, 2020, 8:45pm

How should I kill it?

oxinabox · November 19, 2020, 12:58pm

] rm TensorFlow i guess.
I recommend reading the package managers docs.
https://julialang.github.io/Pkg.jl/v1/

Honza9723 · November 19, 2020, 1:20pm

Thank you, it works!

Topic		Replies	Views
Gradient error in Flux model inputs Machine Learning question , flux , zygote	5	1324	January 13, 2021
Flux loss: Gradient wrt input leads to empty gradient wrt parameters or to "can't differentiate foreigncall" Machine Learning flux , forwarddiff , diffeqflux	3	558	April 8, 2022
How to use gradient of neural network as the loss function? Machine Learning question	13	2739	March 23, 2021
Flux.gradient throws MethodError even though loss is evaluated just fine New to Julia question , diffeq	3	553	April 16, 2022
Jacobian of a network in the loss function with Flux Machine Learning question	6	247	July 23, 2024

Flux differentiation error

Related topics