Parameters of the neural network not updating after training in a Neural ODE problem

Guess it’s just vanishing gradients with tanh when weights are too large initially. At least starting with smaller weights has every layer updating nicely,

julia> optprob = Optimization.OptimizationProblem(optf, Float32(1e-2) .* ComponentVector(_para))
OptimizationProblem. In-place: true
u0: ComponentVector{Float32}(layer_1 = (weight = Float32[0.005868697 -0.0015597978 -0.009511871; -0.009119327 -0.006244393 0.0003362576; …

julia> res1 = Optimization.solve(optprob, OptimizationOptimisers.Adam(),callback = callback,maxiters = 75);

julia> res1.u
ComponentVector{Float32}(layer_1 = (weight = Float32[-0.010741851 -0.01817102 -0.026154043; -0.024623511 -0.021738427 -0.015188978; …