Priors for generated quantities in Turing?

I have a Turing model where I need (in part) to do something like this:

@model test() = begin
    x = ones(10)
    sigma ~ InverseGamma(1, 1)
    x ~ Normal(0, sigma)
end

m1 = sample(test(), NUTS(0.65), 1000)

I was a bit surprised that in the model above, x is not fixed within the model, but ends up being estimated.

Basically, in my “real” model, I have a quantity x the that is derived directly from other (estimated) parameters in the model. x itself should not be estimated.

However, because the generated quantity has a prior that varies – and impacts/is impacted by – other parts of the model, I still need the overall model likelihood to take that into account, so it’s not appropriate to just “not include” a prior on the derived quantity.

Searching didn’t turn up anything useful about how to do something like this, or even whether it’s possible - any Turing users have any idea how to do this sort of thing? Would making a custom function to increment the log-likelihood directly work? Is there an alternative syntax to flag variables as “fixed”?

Because x is fixed in this case, you should pass it into the model as an input, ie something like this:

x = ones(10)

@model test(x) = begin
    sigma ~ InverseGamma(1, 1)
    x ~ Normal(0, sigma)
end

m1 = sample(test(x), NUTS(0.65), 1000)

Turing has 2 specific interpretations of the left hand side of ~ depending on whether it’s a value that’s passed into the model or generated within the block. Seems you want the former (ie treat x as data aka fixed).

2 Likes

That works for the MWE, obviously, but in my “real” model, x is derived from other random variables in the model – it can’t be passed through the initial model function, because it only exists at the point estimates for those other variables exist.

That’s one of the things I’m curious about – is there a way to tell Turing “fix this quantity” within the model? A macro or something? I was considering wrapping it in a static array, but that seems very unlikely to work (and my real problem is too big for SArrays, anyway).

Since the order of instantiation is so important in Turing, I was hoping that specifying the contents of a variable before the prior would be enough to make Turing treat it “like data”, but apparently not. :slight_smile:

I don’t entirely follow the use case but I think it’s still achievable with a small change to the suggestion. I haven’t run this but it’s worth a try to pass x as a variable (convincing Turing to treat it as fixed) and then reconstruct it as needed within your sample loop:

x = ones(10)

@model test(x) = begin
    x = do_something_to_x(other_stuff)
    sigma ~ InverseGamma(1, 1)
    x ~ Normal(0, sigma)
end

m1 = sample(test(x), NUTS(0.65), 1000)

Huh, sneaky. :stuck_out_tongue: That might be worth testing out. Thanks!

And if you’re curious about the use case, it’s Bayesian factor analysis with automatic relevance determination (ARD) priors on the factor loadings (following Tipping, 2001), while also using sum-to-constant constraints on the factor loadings to handle identification of the factor model (Little, Slegers, & Card, 2006).

Basically, if you have one factor and three observed items, you can freely estimate two of your factor loadings, but you require that all three factor loadings together sum to 3.0; this forces the “average” factor loading to be fixed at 1.0, and lets the factor model be identified without using marker variables (e.g., forcing one of the factor loadings to be something like 1.0) or assuming that the latent factor has a fixed variance (e.g., 1.0). It lets you compare latent means and variances across e.g. different groups, and has some nice properties in terms of the latent variable scaling.

All of the factor loadings (including the deterministic ones) have an ARD prior that tries to encourage sparsity across the matrix of factor loadings. This pushes the loadings to “simple structure” in factor analytic terms, with as many “near zero” loadings as possible. This handles both rotational invariance and interpretability in a nice way.

So, I estimate most of the matrix of factor loadings, determine what the last few “have to be” to meet the necessary identification constraints, but then all of them (estimated or derived) need to have an appropriate sparsity prior.

I’ve done this kind of model before using R/Rcpp (e.g., C++), and it worked very well, but I’m trying to move to Turing to the extent I can (and learn about it in the process).

Well, figure I’d report back just to close the loop. Seems to work!

Here, I have a vector of three numbers, where the third (x) depends on the value of the first two (y1, y2). If I just pass x as data, and put a prior on x before combining the three variables into a vector, it doesn’t work – x is still sampled as if the constraint x = 3 - (y1 + y2) didn’t exist.

However, if I pass the container z as below, and put the prior on an element of z, it does seem to work. And I confirmed (by changing the priors) that the prior on the deterministic quantity is working as it should.

@model test(z) = begin
    y1 ~ Normal(1, 0.1)
    y2 ~ Normal(0.5, 0.1)
    x = 3 - (y1 + y2)
    z = [y1, y2, x]
    z[3] ~ Normal(0, 0.1)

    Σ = diagm(0.1*ones(3))
    a ~ MvNormal(z, Σ)
end

m1 = sample(test(zeros(3)), NUTS(0.65), MCMCThreads(), 2000, 4)

mean(Array(m1)[:,1:3]; dims=2)  # Constraint to average 1 is working

Doing a bit more playing around, I found that if I create x by wrapping it’s initial definition in a deepcopy or a [], then things work fine:

@model test(x) = begin
    y1 ~ Normal(1, 0.1)
    y2 ~ Normal(0.5, 0.1)
    x = deepcopy(3 - (y1 + y2))
    x ~ Normal(0, 0.1)
    z = vcat(y1, y2, x)
    
    Σ = diagm(0.1*ones(3))
    a ~ MvNormal(z, Σ)
end

m1 = sample(test(1), NUTS(0.65), MCMCThreads(), 2000, 4)

mean(Array(m1)[:,1:3]; dims=2)

However, if you don’t pass x as data, it goes back to the initial behavior, treating x as a variable to be sampled.

So it seems like there’s something about both passing the variable as data, and doing a deep copy that makes Julia treat x as data, rather than a variable. Either alone isn’t sufficient.

A bit mysterious, but it works!

Just for the record: you can use DynamicPPL.@addlogprob! to add any quantity you want to the joint distribution. So if you want to observe something in the model, e.g. you have x ~ Normal(), you can do

DynamicPPL.@addlogprob! logpdf(Normal(), x)

to ensure that it’s observed even when not part of the arguments:)

2 Likes

Thanks! As it turns out, that’s what I ended up doing eventually.

1 Like