Clarification on Setrdcache in Turing?

So, https://turing.ml/dev/docs/using-turing/autodiff says:

However, note that the use of caching in certain types of models can lead to incorrect results and/or errors. Models for which the compiled tape can be safely cached are models with fixed size loops and no run-time if statements. Compile-time if statements are fine.

I’m not clear what the difference is between “runtime if statements” and “compile-time if statements” is.

Suppose I’m looping through some of the data

for i in 1:10
   if data[i] > 1.0
     pred[i] = myfun(data[i])
   else
     pred[i] = myotherfun(data[i])
   end
end

Is this “compile-time” or “runtime”? I’d assume it’s runtime because when I “run” the model it makes decisions, but maybe not, because every time it goes through this loop it’ll make the same decision because the data is fixed?

It’s clear to me that if I make decisions based on the value of parameters then a cached copy of the “tape” would be wrong for some parameter values. In the absence of any branches on parameter values it’s not clear what would go wrong.

It’s run time because the information inside of an array is not known at compile time. The compiler only knows type information (data is an array), but does not know the values (in general, it can do some constant prop but not in arrays)

I think you meant to say “it’s run time” ??

So I assume something like:

a = (1,2,3)
if a[2] == 2

would be a compile time if statement, that is, it’ll get resolved at compile time.

What I’m having a hard time understand is whether Turing with reversediff and a compiled tape will actually have problem with something like:

if data[i] > 1
...
else
...
end

Assume data is passed into the model function, is constant throughout the execution of the sampling procedure, and therefore every time it hits this branch it will always take the same branch.

Let’s consider another example:
imagine score and school are both data passed in

compare the case where school can either be 1 or 2… and we do:

score[i] ~ Normal(coef[school[i]],1)

vs the case where school can either be 10 or 20… and we do:

score[i] ~ Normal(school[i] == 10 ? coef[1] : coef[2],1)

Clearly the second one has a runtime if statement (inside the ?: operator) but both do exactly the same thing computationally.

I think we need a more nuanced description of what is and isn’t OK in a Turing model with compiled tape. To me it seems most likely that the problem would be any branch predicated on the value of a parameter in which case which branch is taken would vary from one sample to another, and replaying the tape would fail.

Having models where according to the value of the data in one column the model for the data in another column is selected to be different is so incredibly common and if that’s an actual problem we need to make it very explicit in the docs.

Here’s another example

function sigmoid(x)
   if x < zero(x)
      exp(x)/(one(x) + exp(x))
   else
      one(x)/(one(x) + exp(-x))
   end
end 

This is just using branches to avoid overflow/underflow so that exp(x) always takes a negative argument. This is a C^\infty function.

Clearly though if you do something in your Turing model like:

   ...
   data ~ Normal(sigmoid(a+b*x),1.0)
...

suddenly you have a runtime branch in your code, it’s not even obvious because you might not realize how sigmoid is implemented. In fact, the upshot of this is if runtime if statements aren’t allowed you can’t call any function in your turing model unless you know the entire code of that function and everything it calls. That would include things like sin or cos or log or tan or who knows what!

This all becomes really problematic!

typo

That’s fine if you know it will always hit the same branch. What it does is it compiles a code that always will take the same branch, so it better be correct.

Which is why it’s not a default and it’s a “use at your own risk”. In SciML’s SciMLSensitivity.jl there is an automatic branching check that looks through your code (recursively) to see if it’s safe to use

Yeah, this was my thought, I mean I’m no expert at how this works but based on my naive understanding the concern is if the runtime branches that are taken vary through the execution. So specifically that will happen if the branch depends on the value of a parameter. It should always be ok to use branches based solely on the values in the data array as it is constant through the whole sampling process. So for example modeling females and males differently, or modeling each school or each county or each state or each event or each type of animal or each manufacturer of a product… differently, where you’re making choices by looking in the data array and deciding what to do.

1 Like