Making Turing Fast with large numbers of parameters?

on the other hand, my main model that I’m trying to get to run is slow as a dog, and has no truncations… but perhaps it also has a similar “forcing interpretation” issue, I just don’t know what causes it. would be happy to try to find a MWE that doesn’t use truncation.

Maybe try using Jet.jl’s ahead of time optimization analysis? Optimization Analysis · JET.jl

Similar to c++ in that you could catch at type instabilities and other issues before running the code.

Also, c++ compiling doesn’t necessarily mean it’s going to be fast…could produce suboptimal code like vtable (dynamic) dispatch

That sounds like a good thing to look at, but I doubt that type instability is the issue here. I mean, that would explain a slowdown of a factor of say 3 or 10 or similar, but not a factor of 1000 or whatever is going on here. It literally sits there for tens of minutes saying 0% and offering no guess as to how long sampling will take, whereas normally within 2 to 5 seconds it’s already at 5% or something and saying sampling will take a few minutes.

However, I will happily take that JET idea and put it on my “things to use for performance” checklist for once I figure out what’s going on.

Check out the lazy arrays trick here https://github.com/TuringLang/TuringExamples/blob/master/benchmarks/logistic_reg/turing.jl.

I don’t know much about lazyarrays. What is it doing in this case? Does it change the results for you? I will give it a try tomorrow.

Check the Logistic Regression tutorial that uses the LazyArrays and expains what is going on…

1 Like

Well, it says it’s a lazy array, but I’m not sure how that actually helps. What does it do computationally? Saves storage? Makes things work better with autodiff? Improves the cache performance? Avoids branching?

Yes, yes, yes and sometimes. The idea is that you want to work in terms of tracked arrays of numbers as much as possible when using Tracker or ReverseDiff as opposed to arrays of tracked reals. The problem with constructing an array of distributions that depends on the parameters is that we lose this tracked array of parameters when we construct a distribution (e.g. Bernoulli) from each number in the input vector. The logpdf values then get computed using “scalar arithmetic” one for each distribution, as opposed to “vectorised arithmetic” for all the distributions at once, and that’s bad for autodiff performance.

The lazy array trick enables us to create a pipeline where the input is numbers (a tracked array of parameters) and the output is numbers (a tracked array of logpdfs) but there can be intermediate operations that construct distributions or call other functions but these intermediate values never get separately instantiated (or allocated in memory). This lazy pipeline is implemented using lazy broadcasting under the hood and the way broadcasting autodiff works in ReverseDiff’s TrackedArray is using ForwardDiff for the broadcasted function which means it can also handle branches better.

TrackedArrays also have better cache locality than arrays of TrackedReal. So it’s a neat little hack that hits half a dozen birds with one stone.

5 Likes

This is great to know about…but, to echo my comment above, it’s not in the Turing performance tips (or anywhere else in the documentation).

I also have a question–is there any situation where you wouldn’t want to use a LazyArray inside an arraydist?

3 Likes

A PR would be appreciated :slight_smile:

For arraydist of multivariate distributions, I don’t think lazy arrays will help. For arraydist priors that don’t depend on any sampled parameter, lazy arrays are not likely to improve things at all since they won’t interact with autodiff at all.

really should be written by someone who actually knows what is going on though :slight_smile:

2 Likes

True. But it will be faster if someone took their understanding from a thread like this and put it in a PR with some examples and then people from the Turing team corrected or edited their PR.

1 Like

well I’m certainly willing to do that. It could help if I understood exactly how LazyArrays really works. Also what should I ideally have written as the code. I’m thinking…

    totweight ~ arraydist([truncated(Normal(staffweights[staffid[i]] + patientweights[patientid[i]] + usewch[i] * wcwt, measerr),0.0,Inf) for i in 1:length(totweight)])

becomes:

    means = @~ view(staffweights,staffid) .+ view(patientweights,patientid) .+ usewch .* wcwt
    totweight ~ arraydist(LazyArray(Base.broadcasted(m -> truncated(Normal(m,measerr),0.0,Inf),
                                                     means)))

But this gives an error:

ERROR: MethodError: no method matching arraydist(::BroadcastVector{Any, var"#182#184", Tuple{BroadcastVector{Float64, typeof(+), Tuple{BroadcastVector{Float64, typeof(+), Tuple{SubArray{Float64, 1, Vector{Float64}, Tuple{Vector{Int64}}, false}, SubArray{Float64, 1, Vector{Float64}, Tuple{Vector{Int64}}, false}}}, BroadcastVector{Float64, typeof(*), Tuple{Vector{Int64}, Float64}}}}}})
Closest candidates are:
  arraydist(::AbstractVector{<:UnivariateDistribution}) at /home/dlakelan/.julia/packages/DistributionsAD/b93cZ/src/arraydist.jl:5
  arraydist(::AbstractMatrix{<:UnivariateDistribution}) at /home/dlakelan/.julia/packages/DistributionsAD/b93cZ/src/arraydist.jl:17
  arraydist(::AbstractVector{<:MultivariateDistribution}) at /home/dlakelan/.julia/packages/DistributionsAD/b93cZ/src/arraydist.jl:47
Stacktrace:

It makes me think the views are a problem here…

also it doesn’t help if I wrap the @~ as in LazyArray(@~...)

@mohamed82008 is there a modification to Turing which specifically handles BroadcastVectors that I don’t have? is this something in the master branch of Turing or some other thing? When I look at my install:

julia> methods(arraydist)
# 3 methods for generic function "arraydist":
[1] arraydist(dists::AbstractVector{<:UnivariateDistribution}) in DistributionsAD at /home/dlakelan/.julia/packages/DistributionsAD/b93cZ/src/arraydist.jl:5
[2] arraydist(dists::AbstractMatrix{<:UnivariateDistribution}) in DistributionsAD at /home/dlakelan/.julia/packages/DistributionsAD/b93cZ/src/arraydist.jl:17
[3] arraydist(dists::AbstractVector{<:MultivariateDistribution}) in DistributionsAD at /home/dlakelan/.julia/packages/DistributionsAD/b93cZ/src/arraydist.jl:47

I am trying to see if I can find a solution to your problem. But I have a question so I can better understand what is going on. In this particular case, why are you using arraydist?

I’ve tried running a few models using different distributions, and so far have no encountered your problem. It was only when I used truncated(Normal(...)...) or arraydist that I had issues.

The Turing performance tips tell you that if you use arraydist it is somehow better/faster than writing loops. At the very beginning I had a loop over Normals, which was converted to a MvNormal and went tens to hundreds of times faster… but now I want a shape other than normal (and which is restricted to positive values).

In my real problem, like in this problem, the data is restricted to positive values, but unlike in this problem, my real problem can have “most likely” values down near 0, with errors only really possible in the positive direction. If I don’t model that correctly then the model can predict negative values for a quantity that can only logically be 0 or bigger.

Gotchya. My understanding (which is probably wrong) is that arraydist and filldist sped things up when dealing with parameters, but not for the observations. Hopefully someone will correct me if I am wrong.

You could write it using standard broadcasting syntax. That worked fine for me when I rewrote your model 2.


    θ = staffweights[staffid] .+ patientweights[patientid] .+ usewch .* wcwt
    totweight .~ Gamma.(15, θ ./ 14) 

Another option might be a custom distribution. An example using MvBinomial is here.

My only other (probably unhelpful) suggestions at this point to keep things positive are the obvious options (e.g., log(y) or using a something like a Gamma distribution, as in your example), which I assume you have tried or cannot use.

I’ve tried a Gamma but not using the broadcast syntax, it was crazy slow, but I’ll give it a try with broadcast!

I like this LazyArrays idea but it doesn’t seem to work in my case

That’s definitely valid. But the way broadcasting is done in Turing, the RHS will be evaluated first then broadcasting of the logpdf computation will be done afterwards. When using ForwardDiff or Zygote, this is fine. When using Tracker or ReverseDiff, you can do better because of the TrackedArray issue I mention above.

That’s another good option. A perhaps easier alternative is to accumulate the observations’ log probabilities manually using the Turing.@addlogprob! macro. So you can do something like:

Turing.@addlogprob! sum(logpdf.(Gamma.(15, θ ./ 14), totweight))

This line of code is what the lazy arrays trick will evaluate under the hood.

2 Likes

Yeah, I’m actually thinking this might be a smart idea because the thing I’m modeling has a variety of weird aspects to it. If I did this I could make sure it had its own gradient.

What line of code would I use to make the LazyArray method work? I couldn’t get that to work, it would error out saying arraydist couldn’t take that type of object (see errors above)

The problem seems that the type of the vector is BroadcastVector{Any}. Seems like a LazyArrays.jl inference issue. Try asserting the type of the output of the broadcasted function to give the compiler a hint.

1 Like