I think the problem traces back to sol.W being nothing in the backward pass. Using ForwardDiff instead of Zygote to compute the gradients
@time gs = ForwardDiff.gradient(p) do p
some_loss(p)
end
removes the error. If reverse-mode AD is needed, you could use pre-defined noise values and NoiseGrid from the DiffEqNoiseProcess package
t = Array(tspan[0]:dt:tspan[1])
Z = randn(length(t))
Z1 = cumsum([0;sqrt(dt)*Z[1:end-1]])
Zygote.ignore() do
NG = NoiseGrid(t,Z1)
end
tmp_prob = remake(prob,p=p,noise=NG)
as a possible workaround. @ChrisRackauckas, do you have a better idea for using the noise values in a loss function with Zygote?