Numerical errors in logit normal model using Turing.jl

How long did it take to run on your system? After running about 2%, the ETA is around 11.5 hours.

It took a few minutes. Hours sounds strange.

Could you please post your setup? Are you at master?

I am using Turing v0.7.1 because Zygote is still causing problems for me.


(v1.2) pkg> st Turing
    Status `~/.julia/environments/v1.2/Project.toml`
  [0bf59076] AdvancedHMC v0.2.7
  [31c24e10] Distributions v0.21.6
  [c7f686f2] MCMCChains v0.3.14
  [276daf66] SpecialFunctions v0.8.0
  [4c63d2b9] StatsFuns v0.9.0
  [fce5fe82] Turing v0.7.1
  [e88e6eb3] Zygote v0.4.1

I was able to switch to master without encountering a problem with Zygote and the model did run in about one minute. I’m guessing that this is still much slower than Stan, but much better than 11.5 hours. What would cause a nearly 700 fold improvement?

I’m not 100% sure, but it sounds like the gradient computation falls back to compute each gradient independently instead of vectorising the computation. But this should actually not happen anymore.

There should be a new patch release afai that fixes the version issue of Zygote and IRTools.

It seems that FillArrays is interacting badly with Tracker and Distributions. fill of a TrackedReal returns a TrackedArray because it was defined this way in Tracker. And MvNormal supports a TrackedArray mean vector and is tested for it. We don’t properly test or overload any method for FillArrays of a TrackedReal so it’s falling back on some wrong methods. Please open an issue to investigate further.

1 Like