Turing Sampling specified parameters

astro-kevin · November 17, 2022, 12:14am

I think the reverse is true, as in it tries to optimize all the variables in the function definition after a ~ which doesn’t appear in the function definition. The issue I’m having is that obs cannot appear in the function definition as I’ve explained above, but this then mean that the sampler is trying to sample obs nonetheless. I don’t think having obs appear from outside would be feasible because some predictions depends on the parameters (in a tangential kind of way I guess).

Dan · November 17, 2022, 12:17am

What you say sounds right. But if you go with the @addlogprob way, there is no risk of adding parameters. And you can declare the parameters by giving some sort of prior for them in model beginning:

   param1 ~ Unif(0,1)
   param2 ~ Beta(1,1)   # actually the same as Unif(0,1)
   :

astro-kevin · November 17, 2022, 12:21am

Are there any downsides from using the @addlogprob method? Does this cause any of the samplers to fail?

Dan · November 17, 2022, 12:35am

I’m trying to make some toy model now, hopefully of a similar situation. If you can make a MWE for discussion, it would also be helpful…
Toy model so far:

using Turing, StatsPlots, Distributions

data = rand(Bool,10)

@model function toy(data)
    n = length(data)
    p ~ Beta(1,1)
    obsused = ((0:n-1)/n) .< p
    for i in 1:n
        if obsused[i]
            data[i] ~ Bernoulli(0.3)
        end
    end
end

chn = sample(toy(data), HMC(0.1, 5), 1000)

plot(chn)

Is this representative of situation?

dlakelan · November 17, 2022, 12:55am

I can’t emphasize enough that the first thing to get right is the theory, trying to magick some data in from somewhere else via a function is … A big red flag. You evidently have some big dataset, put that thing into the arguments of your model. If there are subsets to be considered… Do it within the model.

dlakelan · November 17, 2022, 1:03am

For example here, the only way you can get the actually observed data is if p=1 (or close to it). Yet the sample is for p ~ 0.1 or so. This is not meaningful inference. The other data exists…

Dan · November 17, 2022, 1:16am

Exactly. The observed data gives no evidence regarding p in this example. But still the setup creates a bias towards low p. This is the bias towards low number of observations, I mentioned earlier.
In more complex models, this needs to be addressed.

astro-kevin · November 17, 2022, 1:19am

The reason I can’t put it into arguments of my model is because I have this ‘lookup’ function, which depends on the results of model. I mean, there’re these big data frames out there which the lookup functions looks up, but that doesn’t change anything even if I put them in the model arguments, because the lookup is still required, and therefore obs will still not appear in the model arguments, and therefore the mode will think it needs to optimize over that as well.

I mean fundamentally, what I’m trying to do is a sampling of the multivariate normal distribution. This is equivalent to summing over the log probabilities where the log probability is just (observed-predicted)^2/variance. It doesn’t matter, in theory, what observed or predicted are, just that they are the same kind of thing, one from a model, and one from data. I think you would agree with me that what I’m saying is reasonable so far.

Where I think you’d disagree with me with is how I get my data, which I understand is pretty non-standard, but it is the data I have access to, and so this is the way that I came up with to make it work. Let’s call observed whatever we get from the data which is binned, and predicted whatever I get out of the model. In essence, no matter if these observed and predicted are of y or of z, it doesn’t matter so long as they match up, e.g. obs[1], pred[1], var[1] is to do with y, obs[2], pred[2], var[12 is to do with z, and obs[3], pred[3], var[3] is to do with y, and so on.

There’s no issue if obs and var are predefined outside of the model (which was my model when I only had the x-y relation). However, because of the inclusion of the other y-z relation, I cannot create this model this way anymore—obs and var for this relation now depend on the values of y, which means that I must create a set of obs and var ad hoc for each set of parameters in order for the multivariate normal/log prob to sum over. The issue arises because there’s no way to explicitly tell Turing that, although I’m essentially making an ad hoc dataset, I’m not trying to optimize over the dataset, and to treat it as if I passed in the dataset from the get go.

dlakelan · November 17, 2022, 2:02am

The computational problem can be fixed with @addlogprob! The part that worries me is if you are including different data depending on what value the parameter takes on.

In order to get meaningful inference, every data point should appear somewhere in the model at each iteration.

Let me try to break the concern down to its most basic idea. I am concerned your model might look something like this:

@model foo(x,y,zvals,scalex, scalesy)
   # is is a vector of [0,1,0,0,1]
   # y is a vector of vectors... each y[i] dataset corresponds to the x[i]
   p ~ someprior()

   zpred,xval,yvals = predictor(p)
   for z in zvals
      if z == zpred
         x ~ Normal(xval,scales)
         y ~ MvNormal(yvals,Diagonal(scalesy))
      end
   end
end

In other words, based on what p is, we predict a given zpred, xval, and set of yvals. Then we look in our dataset, and just condition on the data associated with that one z value.

This is not going to give valid inference for p. The other data points in your dataset don’t cease to exist because p made you predict zpred,xval,yvals, you’re just ignoring them because they’re inconvenient.

Imagine you have a model that given a parameter it predicts the net worth of people as a single number, and then predicts a single weight bin (x), and a set of heights for them (yvals). you look in the census, find people whose net worth bins into the same bin as the prediction, then you compare their weight bin and heights to the predictions using a multivariate normal.

The fact is the census has people with other net worths. If your model predicts that everyone’s net worth is within a certain bin of a given value… then your model has been proven to have zero probability already, because there exist people outside that bin. The likelihood of your model is zero.

So, maybe I’m misunderstanding your problem, but I hope maybe my concern is clear above.

astro-kevin · November 17, 2022, 2:12am

I have created a minimum working example.

using DataFrames, LinearAlgebra, Turing, StatsPlots, Distributions

data1 = DataFrame(bin_min = [0,10], bin_max = [10,20], val = [1,2], err = [0.5,0.6])
data2 = DataFrame(bin_min = [0,15], bin_max = [15,30], val = [1,2], err = [0.5,0.6])

x = Vector(LinRange(0, 20, 20))
function getdata1(x, data)
    bin = findfirst(data[:, "bin_max"] .>= x)
    obs = x + data[bin, "val"]
    variance = x * data[bin, "err"]
    return obs, variance
end

function getdata2(y, data)
    bin = findfirst(data[:, "bin_max"] .>= y)
    obs = y - data[bin, "val"]
    variance = y + data[bin, "err"]
    return obs, variance
end

function getdata(x, data1, data2)
    y, yvar = getdata1(x,data1)
    z, zvar = getdata2(y,data2)
    return [y,z], [yvar, zvar]
end

function model(x, parameters)
    return x+parameters[1], x+parameters[1]-parameters[2]
end
    
@model function toy(x)
    parameters ~ Product([Uniform(0,1),Uniform(0, 1)])
    predictions = zeros(eltype(parameters), 2*length(x))
    obs = zeros(eltype(parameters), 2*length(x))
    variance = zeros(eltype(parameters), 2*length(x))
    for (i,val) in enumerate(x)
        predictions[i], predictions[2*i] = (val+parameters[1]), (val+parameters[1]-parameters[2])
        obs[i], obs[2*i] = getdata(val,data1, data2)[1]
        variance[i], variance[2*i] = getdata(i,data1, data2)[2]
    end
    obs ~ MvNormal(predictions, Diagonal(variance))
end

chn = sample(toy(x), MH(), 10)

plot(chn)

This runs but doesn’t work probably because I haven’t put much thought into making my model actually good at predicting what it needs to predict (if you use HMC it gives endless is infinite warnings which means that this probably doesn’t coverge). But I hope you see what I’m trying to achieve here. This model tries to fit not only parameters, but also obs, which is not desired.

astro-kevin · November 17, 2022, 2:18am

I think I understand your concern. In my model, x is well represented so I’m not just throwing out x which are inconvenient to me. I do understand that there is an issue where y might not be well represented (which is a headache for me) if x combined with some parameters give some value of y which is out of bounds of my data, but I’m so far very artificially assigning an arbitrarily large log prob to these values to ‘discourage’ the sampler from going there. I made a minimum working example which I hope explains better what I’m trying to do.

Dan · November 17, 2022, 2:21am

Here is another demo example to show @addlogprob! works well:

data = rand(Bernoulli(0.7),1000);
@model function toy2fix()
    p ~ Uniform(0.001,0.999)
    N = 40
    obs = [sum(data[(10*i):(20*i)]) for i=1:N]
    pred = [p*(10*i+1) for i=1:N]
    var = [4*sqrt((10*i+1)*p*(1-p)) for i=1:N]
    for i in 1:N
        Turing.@addlogprob! loglikelihood(Normal(pred[i],var[i]),obs[i])
    end
end
chn = sample(toy2fix(), HMC(0.1, 5), 10000)
plot(chn)

Correctly giving weight to p==0.7.
It also has the qualities of calculating the obs/pred inside the model.

dlakelan · November 17, 2022, 2:25am

Let’s try to abstract this a bit. here’s pseudocode:

loop over x
collect all the y’s that have that x
collect all the z’s that have one of those y’s
compare the y,z values to predicted y,z values for that given x.
repeat until all x exhausted

if that’s what you’re up to, then you’re on good footing I think.

What you want is something with the flavor

@addlogprob(logpdf(MvNormal(predictions,covariances),observedvals)

because as you say, if you put your “collected” observedvals on the left side it will treat it like a parameter.

I think so long as you are looping over all your data, and just looking at it in irregular “slices” at a time, you will have a valid model, it’s just a computational problem that needs fixing as above with @addlogprob!

astro-kevin · November 17, 2022, 2:31am

Yes, that sounds exactly like what I’m trying to do.

I just want to clarify whether the logpdf(MvNormal that you have is equivalent to the loglikelihood(Normal that @Dan has as long as there is no covariance (variances only on diagonal of cov-matrix?)

Thanks so much for both of your help, and apologies for not being clearer from the beginning.

dlakelan · November 17, 2022, 3:19am

Yeah, I think loglikelihood and logpdf are going to be the same.

One thing you might consider doing though, is to process your data in the data frame to make your whole computation more convenient. For example you can

gdf = groupby(mydata,:x)

and then pass that in to the model and iterate over each group rather than having complicated “collector” routines running inside your model evaluation

@model function foo(gdf)
   params ~ MyPrior()
   for g in gdf
       y,z = makepredictions(g.x[1],params)
       @addlogprob!(...)
   end
end

astro-kevin · November 24, 2022, 2:42am

Thanks for your solution. After running some tests, I’ve been finding that metropolis Hastings runs, but gives very poor effective samples, while the other gradient based methods don’t even work because of endless isfinite.((θ, r, ℓπ, ℓκ)) = (true, true, false, true) errors. I don’t really know how to diagnose an error like this. From what I understand this means my model is poorly specified, but I don’t really know how to change my model to fix this. FWIW, the previous version of my model which has a static obs and var only occasionally would throw up isfinite.((θ, r, ℓπ, ℓκ)) = (true, false, false, false) errors, so something in the change must have triggered this…

dlakelan · November 24, 2022, 5:34am

really hard to offer general advice here as compared with perhaps working code, or a minimal working example of the problem.

if log probability is non-finite this basically means you’re calculating log(0). It can be because your initialization is poor, but also can be because of other problems.

astro-kevin · November 24, 2022, 6:38am

Thanks for replying. The docs seem to indicate that ℓπ corresponds to the Cached neg potential energy for the current θ.. Since that’s the only parameter that it is complaining about, it doesn’t appear that I’m calculating log(0), but rather, something with my model is causing the potential energy to be infinitely negative. I understand this may be highly model specific, but it seems odd that the sampler doesn’t eventually find its way to convergence—this would mean that the potential energy is infinite everywhere in my model. I’m not an expert on HMC methods, but I don’t think that sounds right because I know the ‘right answer’ is somewhere in that space.

dlakelan · November 24, 2022, 6:19pm

The potential energy in the Hamiltonian is -ln(p) where p is the probability density of your model evaluated at theta. So either your initial condition is not good, or your model has a bug. Try to pass a known good initial condition.

astro-kevin · November 28, 2022, 3:41pm

Thanks for your suggestion. How do you set an initial condition in Turing?

Topic		Replies	Views
Parameter Fitting in Turing Probabilistic Programming turing	15	1441	October 12, 2022
Sampling predictive with MvNormal in Turing Probabilistic Programming turing	2	199	June 7, 2024
Trouble coding a continuous / discrete 'mixture model' in Turing Probabilistic Programming turing	20	695	August 22, 2022
How to specify a Turing model to have the 2 vectors as input for a distribution that returns a tuple Probabilistic Programming turing , distributions , lba , sequantial-models	19	465	July 1, 2023
Multi-level varying slopes with two clusters, cross classification Probabilistic Programming statistics , turing , distributions , bayesian-inference	16	2146	August 14, 2021

Turing Sampling specified parameters

Related topics