Turing Sampling specified parameters

In my model, I have

@model function runmodel(stuff, parameters)
predictions, otherstuff = model(stuff, parameters)
obs, var = findobsvar(stuff, otherstuff)
obs ~ MvNormal(predictions, Diagonal(var))

where findobsvar is a function which takes in stuff and returns my observations and variances which I need for my multivariate normal distribution.

However, when I run my model using sample, I find that it has obs as a parameter in addition to parameters.

How do I stop that? I only want parameters as my variable since I’m only trying to constrain parameters of my model. findobsvar is just a helper function in getting the values of obs and var within the model.

My understanding is that Turing looks at the arguments to the model function to decide whether a quantity on the LHS of ~ is a parameter or data.

Basically pass obs as an argument to runmodel and your problem will go away.

Perhaps Turing devs have some more nuanced view

But findobsvar depends on otherstuff, so that wouldn’t work…

Your data depends on your parameters?

Yes, well, kind of. The set of data I use to measure my predictions against (and therefore the goodness of my parameters) varies depends on my predictions. Specifically, I have binned data, and where my parameters fall affects which bin I pull my observations from.

This sounds quite problematic from a consistent inference perspective.

I suggest you model the binned data. You might do that by generating predictions, then binning the predictions and comparing the bins to the data bins

Edit: or by using models for binned data directly with CDF functions in your likelihood

I get what you’re saying, but modeling the binned data is kind of the entire point of my model, i.e.—if my model is right, then the results I get from my model should be the same as the binned data.

Say model takes in x (part of stuff) and returns two outputs: y, z. I have got binned relations relating x to y and y to z. Now, obviously the relation from x to y can be easily fitted because x is an input, so as long as I precalculate the y values using the bins (which are binned in x), that’s fine. I can pass it in as obs as an argument to runmodel. That was what I had in the first version of my code. The issue arises when I need to then also constrain using the relation from y to z, which is binned in y. Now, I don’t know what y is until I actually pass stuff through model and it spits out some y, which is the point at which I can go to the binned data (binned in y) and look up what the data for z is. And only then can I compare the predicted z from model to the data z from the binned data (from findobsvar).

Now, if the model is ‘right’, then the multivariate normal distribution should have each of the errors resulting from the model minimized (if we’re thinking about this in a kind of negative log likelihood kind of way, which I think is equivalent to the multivariate normal).

Let me see if I understand. You have measures of variables x,y,z and a model such that if you know x and some parameters you can predict y, and a model such a that if you know y and some parameters you can predict z.

And you have a data structure where for each bin value in x there is a dataset in y, and for each bin value in y there is a dataset in z

Is that about the size of it?

Here is my question, evidently you have data for x,y,z values that are not compatible with a single parameter… That is for a given parameter you will narrow down the x possibilities and then that narrows down the y, and that narrows down the z… But for a different parameter you will use different x,y,z values!!! That’s not going to give you a consistent inference.

Imagine you are trying to find out something about mammals in North America… If your parameter is less than 0 you will compare the results to Opossums, if the parameter is between 0 and 1 you will compare to dogs, if it’s greater than 1 you will compare to bears…

It makes no sense. So I’m guessing I’m missing something.

For example it would make perfect sense to me if there were 3 parameters, one for the Opossums, with a prior that constraints it to less than 0, one for dogs with a prior that constraints to 0,1 and one for bears with a prior that it’s > 1…

Then I grab the value of the Opossums, predict the opossum data and compare to actual, predict the dog data from the dog parameter, and the bear data from the bear parameter… Ultimately I’m comparing to the full dataset every time!

I think

Let me see if I understand. You have measures of variables x,y,z and a model such that if you know x and some parameters you can predict y, and a model such a that if you know y and some parameters you can predict z.
And you have a data structure where for each bin value in x there is a dataset in y, and for each bin value in y there is a dataset in z

makes sense to me. I basically have a big table somewhere which has bins in x, which gives me what y is with an error. Say, for x between 1 and 1.1, y=3x+/-(0.1x), then for x between 1.1 and 1.2, y=2.7x+/-(0.15x), etc. And I have bins of this x for all the values which I feed into model. Similarly, I have a the same data structure for bins in y, but values in z—e.g. for y between 10 and 12, z=y^0.2+/-(5y), and for y between 12 and 14, z=y^0.1+/-(10y). Of course, I’m completely making up these values here, the real values In my data are of an actual physical relation.

And so to me, what makes sense is this: when I pass in x to my model, it gives me some y and z. I can immediately have the three components of the normal distribution which is (observation-prediction)^2/variance for the x-y relation: observation is from my binned data, prediction is from my model, and variance is whatever the error that my binned data tells me the error is. However, in order to get the y-z relation, I need to get the value of y which I got from my x-y relation, and feed it to my y-z relation to get my z value to compare with the z from my model. I think another way to think of these relations is that y is related to x via a piecewise function described by my binned data, and likewise for the y-z relation.

So at the end of the day, I’m just using y and z from the two binned datasets to constrain my model. If my model is right, then the error in the x-y relation, and y-z relation should be on par with the scatter of my stuff, I think.

Here’s the part that makes no sense. You’re trying to get

p(params| data) but you are calculating

p(data[subset(params)] | params) p(params)

I’m not too sure what you mean. Are you saying that my x-y relation is calculating p(data[subset(params)] | params) while my y-z relation is calculating p(params)?

No I’m saying that you are ignoring all of your data except for a subset and the subset depends on the value of the parameters. This does not tell you p(params | data)

Trying to recap/understand the model, what I see is:

  params, stuff  => model_func => predictions, other
  other, stuff => more_model => observations, variance
  prediction - observations ~ Normal( 0, variance)

Since a model just calculates logprob(…), and tries to maximize on a subset of variables called “params”, this looks okay. The only problem I see is a problem with different numbers of observations. Essentially, the model might bias towards params producing few observations because then the logprob is not so penalized. Perhaps, a normalization of the logprob is in order. For example:

  logprob = logprob(predictions - observations, variance) / #observations

Does this make sense?

Ah, I see what you’re saying. I have a very large number of x values which I pass through to my model, these different x adequately represent all the bins of x. Each of these x generates a y and a z which I’m putting into my MvNormal distribution, which is why I’m trying to avoid having it sample obs, because obs is just the MvNormal of all of these y and z values generated from these x compared to the data with the variance.

Yes I think that’s all right. I figured out how to make it all work if one only needs to create a negative log likelihood, but I’m having trouble with the Turing schema and how I make it work in the MvNormal distribution. Does your method replace the whole MvNormal function with the log prob and just tries to sample params using log prob?

The MvNormal with a diagonal variance is just a sum of Normals. So basically a

for i in 1:length(obs)
    obs[i] ~ Normal(pred[i], variance[i])

This will update the logprob, length(obs) times. All the other calculations are deterministic and will not influence the likelihood.
But perhaps it is best to do it manually (not a Turing expert) with:

for i in 1:length(obs)
    Turing.@addlogprob! normallikelihood(obs[i], pred[i], variance[i])

I think this is equivalent to saying obs ~ MvNormal(predictions, Diagonal(var)) which I already have in my code right? The problem is that Turing takes this to mean that obs is also a parameter to optimize over, since obs doesn’t appear in the model definition.

Can also be written as,

noise ~ MvNormal(obs .- pred, Diagonal(var))

will this help?

Doesn’t this also sample over noise since it doesn’t appear in the function definition?

The sampling is over the parameters, …some incorrect idea…
But I may be confused. I need to try this in a real session.