Thanks Mohamed for stressing this issue.
Some months ago my colleague and I tried a slightly different version of this model with LazyArrays 0.16.16 and the performance was way better (around 44 seconds; but not as good as Stan) before updating to LazyArrays 0.17.0. So maybe there is a change in LazyArrays internals that Turing is not handling very well?
I am trying to benchmark this again with downgrading to LazyArrays 0.16.16 but there are some dependency issues going on. Maybe I can find a manifest.toml somewhere to replicate this phenomenon…