Making Turing Fast with large numbers of parameters?

This is an odd example. My intuition tells me we should be about as fast as Stan here if not faster due to SIMD. This is a nice model. I think it’s worth digging into why Stan is so much faster here.

3 Likes

Thanks Mohamed for stressing this issue.

Some months ago my colleague and I tried a slightly different version of this model with LazyArrays 0.16.16 and the performance was way better (around 44 seconds; but not as good as Stan) before updating to LazyArrays 0.17.0. So maybe there is a change in LazyArrays internals that Turing is not handling very well?

I am trying to benchmark this again with downgrading to LazyArrays 0.16.16 but there are some dependency issues going on. Maybe I can find a manifest.toml somewhere to replicate this phenomenon…

1 Like

Yes if you can go back to previous Turing versions too, it might expose a regression that happened somewhere in Turing. The syntax of Turing has been pretty stable for a while so your model should probably just work. Another advice is to avoid using NUTS and use static HMC instead in both Turing and Stan to help identify if the issue is the implementation of NUTS or the Turing infrastructure and AD adding overhead.

Would it be useful to create a project to do benchmarks of some fixed set of models? I think that Andrew Gelman and Aki Vehtari are putting together some kind of model “bestiary” with reference fits so that people can check their fitting software against known good data sets / samples / posterior draws. I remember reading about that somewhere, but I can’t find the reference post now.

But anyway, I do think it might be quite useful to put together a bestiary of models, perhaps from sources like textbooks or the like, then implement them in Turing, and have a testing script that runs the models, compares the draws to known good draws, and measures the sampling time etc.

This would be a good way to find both performance and correctness regressions or improvements.

I’d be willing to work on that a bit if I could get a collaborator on it.

Alot of discussion in this thread: Addressing Stan speed claims in general - #17 by adlauretig - Stan Governance - The Stan Forums also Aki linked to this: GitHub - stan-dev/posteriordb: Database with posteriors of interest for Bayesian inference (I’ve no clue about such issues, just sharing the link!)

1 Like

Yes that was it, posteriordb! Thanks for the link.

Also I just finished reading that thread. An interesting read. I liked @ChrisRackauckas contribution. I think it’s hard for people to understand the level of composability we get from Julia. Turing is such a great resource precisely because you can use so much stuff from the generic Julia ecosystem. For example, suppose you want to do something slightly wacky like serialize a bunch of intermediate computations to disk at every evaluation so as to observe the behavior of the model fit… Noone has to give you permission, just call serialize, or write a CSV. Want to profile the model? Just run Julia’s profiler. Want to run an optimization? Call an optimizer. Trying to do inference on a satellite image of the earth? Use image processing libraries…

3 Likes

Ok, so I had a variety of difficulties with arraydist(LazyArray...) this morning which I have ultimately fixed by doing something like @addlogprob!(sum(logpdf.(...))) which seems to be working.

Since the bug reported related to this seems to say that arraydist is going to be deprecated, I guess the question is where is this kind of thing going to go and how will we tell people to use it in the future. Anyone who wants to weigh in on that can join the bug report:

https://github.com/TuringLang/Turing.jl/issues/1723

Is it reasonable that on an average laptop running Windows 11 this same code takes 118 seconds to run? My own code is taking ages to run and I’m trying to understand if this is specific to my code or a more general issue with running Julia on my machine. Sorry if this is an odd question, thanks!