Clarification on Setrdcache in Turing?

dlakelan · October 15, 2022, 5:00am

So, https://turing.ml/dev/docs/using-turing/autodiff says:

However, note that the use of caching in certain types of models can lead to incorrect results and/or errors. Models for which the compiled tape can be safely cached are models with fixed size loops and no run-time if statements. Compile-time if statements are fine.

I’m not clear what the difference is between “runtime if statements” and “compile-time if statements” is.

Suppose I’m looping through some of the data

for i in 1:10
   if data[i] > 1.0
     pred[i] = myfun(data[i])
   else
     pred[i] = myotherfun(data[i])
   end
end

Is this “compile-time” or “runtime”? I’d assume it’s runtime because when I “run” the model it makes decisions, but maybe not, because every time it goes through this loop it’ll make the same decision because the data is fixed?

It’s clear to me that if I make decisions based on the value of parameters then a cached copy of the “tape” would be wrong for some parameter values. In the absence of any branches on parameter values it’s not clear what would go wrong.

ChrisRackauckas · October 15, 2022, 8:17pm

It’s run time because the information inside of an array is not known at compile time. The compiler only knows type information (data is an array), but does not know the values (in general, it can do some constant prop but not in arrays)

dlakelan · October 15, 2022, 9:36pm

I think you meant to say “it’s run time” ??

So I assume something like:

a = (1,2,3)
if a[2] == 2

would be a compile time if statement, that is, it’ll get resolved at compile time.

What I’m having a hard time understand is whether Turing with reversediff and a compiled tape will actually have problem with something like:

if data[i] > 1
...
else
...
end

Assume data is passed into the model function, is constant throughout the execution of the sampling procedure, and therefore every time it hits this branch it will always take the same branch.

Let’s consider another example:
imagine score and school are both data passed in

compare the case where school can either be 1 or 2… and we do:

score[i] ~ Normal(coef[school[i]],1)

vs the case where school can either be 10 or 20… and we do:

score[i] ~ Normal(school[i] == 10 ? coef[1] : coef[2],1)

Clearly the second one has a runtime if statement (inside the ?: operator) but both do exactly the same thing computationally.

I think we need a more nuanced description of what is and isn’t OK in a Turing model with compiled tape. To me it seems most likely that the problem would be any branch predicated on the value of a parameter in which case which branch is taken would vary from one sample to another, and replaying the tape would fail.

Having models where according to the value of the data in one column the model for the data in another column is selected to be different is so incredibly common and if that’s an actual problem we need to make it very explicit in the docs.

dlakelan · October 16, 2022, 1:33am

Here’s another example

function sigmoid(x)
   if x < zero(x)
      exp(x)/(one(x) + exp(x))
   else
      one(x)/(one(x) + exp(-x))
   end
end

This is just using branches to avoid overflow/underflow so that exp(x) always takes a negative argument. This is a C^\infty function.

Clearly though if you do something in your Turing model like:

   ...
   data ~ Normal(sigmoid(a+b*x),1.0)
...

suddenly you have a runtime branch in your code, it’s not even obvious because you might not realize how sigmoid is implemented. In fact, the upshot of this is if runtime if statements aren’t allowed you can’t call any function in your turing model unless you know the entire code of that function and everything it calls. That would include things like sin or cos or log or tan or who knows what!

This all becomes really problematic!

ChrisRackauckas · October 16, 2022, 5:47am

typo

ChrisRackauckas · October 16, 2022, 5:47am

That’s fine if you know it will always hit the same branch. What it does is it compiles a code that always will take the same branch, so it better be correct.

ChrisRackauckas · October 16, 2022, 5:49am

Which is why it’s not a default and it’s a “use at your own risk”. In SciML’s SciMLSensitivity.jl there is an automatic branching check that looks through your code (recursively) to see if it’s safe to use

dlakelan · October 16, 2022, 2:20pm

Yeah, this was my thought, I mean I’m no expert at how this works but based on my naive understanding the concern is if the runtime branches that are taken vary through the execution. So specifically that will happen if the branch depends on the value of a parameter. It should always be ok to use branches based solely on the values in the data array as it is constant through the whole sampling process. So for example modeling females and males differently, or modeling each school or each county or each state or each event or each type of animal or each manufacturer of a product… differently, where you’re making choices by looking in the data array and deciding what to do.

Topic		Replies	Views
Unexpected performance mismatch in gradients for "compiled-tape-in-tape" experiment Performance autodiff	0	180	June 25, 2023
Turing.jl for Causal Inference model Probabilistic Programming performance , turing	19	1690	February 4, 2022
Conditional Branching of Parameter in Turing.jl Modelling & Simulations question , package , diffeq	10	1033	November 13, 2020
Making Turing Fast with large numbers of parameters? General Usage turing	127	9571	April 9, 2023
Improving performance of item response model in Turing.jl Performance turing	23	2322	January 18, 2023

Clarification on Setrdcache in Turing?

Related topics