Turing Sampling specified parameters

I think the reverse is true, as in it tries to optimize all the variables in the function definition after a ~ which doesn’t appear in the function definition. The issue I’m having is that obs cannot appear in the function definition as I’ve explained above, but this then mean that the sampler is trying to sample obs nonetheless. I don’t think having obs appear from outside would be feasible because some predictions depends on the parameters (in a tangential kind of way I guess).

What you say sounds right. But if you go with the @addlogprob way, there is no risk of adding parameters. And you can declare the parameters by giving some sort of prior for them in model beginning:

   param1 ~ Unif(0,1)
   param2 ~ Beta(1,1)   # actually the same as Unif(0,1)
   :

Are there any downsides from using the @addlogprob method? Does this cause any of the samplers to fail?

I’m trying to make some toy model now, hopefully of a similar situation. If you can make a MWE for discussion, it would also be helpful…
Toy model so far:

using Turing, StatsPlots, Distributions

data = rand(Bool,10)

@model function toy(data)
    n = length(data)
    p ~ Beta(1,1)
    obsused = ((0:n-1)/n) .< p
    for i in 1:n
        if obsused[i]
            data[i] ~ Bernoulli(0.3)
        end
    end
end

chn = sample(toy(data), HMC(0.1, 5), 1000)

plot(chn)

Is this representative of situation?

I can’t emphasize enough that the first thing to get right is the theory, trying to magick some data in from somewhere else via a function is … A big red flag. You evidently have some big dataset, put that thing into the arguments of your model. If there are subsets to be considered… Do it within the model.

For example here, the only way you can get the actually observed data is if p=1 (or close to it). Yet the sample is for p ~ 0.1 or so. This is not meaningful inference. The other data exists…

Exactly. The observed data gives no evidence regarding p in this example. But still the setup creates a bias towards low p. This is the bias towards low number of observations, I mentioned earlier.
In more complex models, this needs to be addressed.

The reason I can’t put it into arguments of my model is because I have this ‘lookup’ function, which depends on the results of model. I mean, there’re these big data frames out there which the lookup functions looks up, but that doesn’t change anything even if I put them in the model arguments, because the lookup is still required, and therefore obs will still not appear in the model arguments, and therefore the mode will think it needs to optimize over that as well.

I mean fundamentally, what I’m trying to do is a sampling of the multivariate normal distribution. This is equivalent to summing over the log probabilities where the log probability is just (observed-predicted)^2/variance. It doesn’t matter, in theory, what observed or predicted are, just that they are the same kind of thing, one from a model, and one from data. I think you would agree with me that what I’m saying is reasonable so far.

Where I think you’d disagree with me with is how I get my data, which I understand is pretty non-standard, but it is the data I have access to, and so this is the way that I came up with to make it work. Let’s call observed whatever we get from the data which is binned, and predicted whatever I get out of the model. In essence, no matter if these observed and predicted are of y or of z, it doesn’t matter so long as they match up, e.g. obs[1], pred[1], var[1] is to do with y, obs[2], pred[2], var[12 is to do with z, and obs[3], pred[3], var[3] is to do with y, and so on.

There’s no issue if obs and var are predefined outside of the model (which was my model when I only had the x-y relation). However, because of the inclusion of the other y-z relation, I cannot create this model this way anymore—obs and var for this relation now depend on the values of y, which means that I must create a set of obs and var ad hoc for each set of parameters in order for the multivariate normal/log prob to sum over. The issue arises because there’s no way to explicitly tell Turing that, although I’m essentially making an ad hoc dataset, I’m not trying to optimize over the dataset, and to treat it as if I passed in the dataset from the get go.

The computational problem can be fixed with @addlogprob! The part that worries me is if you are including different data depending on what value the parameter takes on.

In order to get meaningful inference, every data point should appear somewhere in the model at each iteration.

Let me try to break the concern down to its most basic idea. I am concerned your model might look something like this:

@model foo(x,y,zvals,scalex, scalesy)
   # is is a vector of [0,1,0,0,1]
   # y is a vector of vectors... each y[i] dataset corresponds to the x[i]
   p ~ someprior()

   zpred,xval,yvals = predictor(p)
   for z in zvals
      if z == zpred
         x ~ Normal(xval,scales)
         y ~ MvNormal(yvals,Diagonal(scalesy))
      end
   end
end

In other words, based on what p is, we predict a given zpred, xval, and set of yvals. Then we look in our dataset, and just condition on the data associated with that one z value.

This is not going to give valid inference for p. The other data points in your dataset don’t cease to exist because p made you predict zpred,xval,yvals, you’re just ignoring them because they’re inconvenient.

Imagine you have a model that given a parameter it predicts the net worth of people as a single number, and then predicts a single weight bin (x), and a set of heights for them (yvals). you look in the census, find people whose net worth bins into the same bin as the prediction, then you compare their weight bin and heights to the predictions using a multivariate normal.

The fact is the census has people with other net worths. If your model predicts that everyone’s net worth is within a certain bin of a given value… then your model has been proven to have zero probability already, because there exist people outside that bin. The likelihood of your model is zero.

So, maybe I’m misunderstanding your problem, but I hope maybe my concern is clear above.

I have created a minimum working example.

using DataFrames, LinearAlgebra, Turing, StatsPlots, Distributions

data1 = DataFrame(bin_min = [0,10], bin_max = [10,20], val = [1,2], err = [0.5,0.6])
data2 = DataFrame(bin_min = [0,15], bin_max = [15,30], val = [1,2], err = [0.5,0.6])

x = Vector(LinRange(0, 20, 20))
function getdata1(x, data)
    bin = findfirst(data[:, "bin_max"] .>= x)
    obs = x + data[bin, "val"]
    variance = x * data[bin, "err"]
    return obs, variance
end

function getdata2(y, data)
    bin = findfirst(data[:, "bin_max"] .>= y)
    obs = y - data[bin, "val"]
    variance = y + data[bin, "err"]
    return obs, variance
end

function getdata(x, data1, data2)
    y, yvar = getdata1(x,data1)
    z, zvar = getdata2(y,data2)
    return [y,z], [yvar, zvar]
end

function model(x, parameters)
    return x+parameters[1], x+parameters[1]-parameters[2]
end
    
@model function toy(x)
    parameters ~ Product([Uniform(0,1),Uniform(0, 1)])
    predictions = zeros(eltype(parameters), 2*length(x))
    obs = zeros(eltype(parameters), 2*length(x))
    variance = zeros(eltype(parameters), 2*length(x))
    for (i,val) in enumerate(x)
        predictions[i], predictions[2*i] = (val+parameters[1]), (val+parameters[1]-parameters[2])
        obs[i], obs[2*i] = getdata(val,data1, data2)[1]
        variance[i], variance[2*i] = getdata(i,data1, data2)[2]
    end
    obs ~ MvNormal(predictions, Diagonal(variance))
end

chn = sample(toy(x), MH(), 10)

plot(chn)

This runs but doesn’t work probably because I haven’t put much thought into making my model actually good at predicting what it needs to predict (if you use HMC it gives endless is infinite warnings which means that this probably doesn’t coverge). But I hope you see what I’m trying to achieve here. This model tries to fit not only parameters, but also obs, which is not desired.

I think I understand your concern. In my model, x is well represented so I’m not just throwing out x which are inconvenient to me. I do understand that there is an issue where y might not be well represented (which is a headache for me) if x combined with some parameters give some value of y which is out of bounds of my data, but I’m so far very artificially assigning an arbitrarily large log prob to these values to ‘discourage’ the sampler from going there. I made a minimum working example which I hope explains better what I’m trying to do.

Here is another demo example to show @addlogprob! works well:

data = rand(Bernoulli(0.7),1000);
@model function toy2fix()
    p ~ Uniform(0.001,0.999)
    N = 40
    obs = [sum(data[(10*i):(20*i)]) for i=1:N]
    pred = [p*(10*i+1) for i=1:N]
    var = [4*sqrt((10*i+1)*p*(1-p)) for i=1:N]
    for i in 1:N
        Turing.@addlogprob! loglikelihood(Normal(pred[i],var[i]),obs[i])
    end
end
chn = sample(toy2fix(), HMC(0.1, 5), 10000)
plot(chn)

Correctly giving weight to p==0.7.
It also has the qualities of calculating the obs/pred inside the model.

1 Like

Let’s try to abstract this a bit. here’s pseudocode:

  1. loop over x
  2. collect all the y’s that have that x
  3. collect all the z’s that have one of those y’s
  4. compare the y,z values to predicted y,z values for that given x.
  5. repeat until all x exhausted

if that’s what you’re up to, then you’re on good footing I think.

What you want is something with the flavor

@addlogprob(logpdf(MvNormal(predictions,covariances),observedvals)

because as you say, if you put your “collected” observedvals on the left side it will treat it like a parameter.

I think so long as you are looping over all your data, and just looking at it in irregular “slices” at a time, you will have a valid model, it’s just a computational problem that needs fixing as above with @addlogprob!

1 Like

Yes, that sounds exactly like what I’m trying to do.

I just want to clarify whether the logpdf(MvNormal that you have is equivalent to the loglikelihood(Normal that @Dan has as long as there is no covariance (variances only on diagonal of cov-matrix?)

Thanks so much for both of your help, and apologies for not being clearer from the beginning.

Yeah, I think loglikelihood and logpdf are going to be the same.

One thing you might consider doing though, is to process your data in the data frame to make your whole computation more convenient. For example you can

gdf = groupby(mydata,:x)

and then pass that in to the model and iterate over each group rather than having complicated “collector” routines running inside your model evaluation

@model function foo(gdf)
   params ~ MyPrior()
   for g in gdf
       y,z = makepredictions(g.x[1],params)
       @addlogprob!(...)
   end
end

Thanks for your solution. After running some tests, I’ve been finding that metropolis Hastings runs, but gives very poor effective samples, while the other gradient based methods don’t even work because of endless isfinite.((θ, r, ℓπ, ℓκ)) = (true, true, false, true) errors. I don’t really know how to diagnose an error like this. From what I understand this means my model is poorly specified, but I don’t really know how to change my model to fix this. FWIW, the previous version of my model which has a static obs and var only occasionally would throw up isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false) errors, so something in the change must have triggered this…

really hard to offer general advice here as compared with perhaps working code, or a minimal working example of the problem.

if log probability is non-finite this basically means you’re calculating log(0). It can be because your initialization is poor, but also can be because of other problems.

Thanks for replying. The docs seem to indicate that ℓπ corresponds to the Cached neg potential energy for the current θ.. Since that’s the only parameter that it is complaining about, it doesn’t appear that I’m calculating log(0), but rather, something with my model is causing the potential energy to be infinitely negative. I understand this may be highly model specific, but it seems odd that the sampler doesn’t eventually find its way to convergence—this would mean that the potential energy is infinite everywhere in my model. I’m not an expert on HMC methods, but I don’t think that sounds right because I know the ‘right answer’ is somewhere in that space.

The potential energy in the Hamiltonian is -ln(p) where p is the probability density of your model evaluated at theta. So either your initial condition is not good, or your model has a bug. Try to pass a known good initial condition.

Thanks for your suggestion. How do you set an initial condition in Turing?

1 Like