Feedback request: `rand` and `predict` semantics

Hi, I’m working on Tilde.jl, and thinking some more about how rand and predict should be set up.

In current development, this works:

julia> m = @model n begin
           μ ~ Normal(σ=10)
           σ ~ Exponential()
           y ~ Normal(μ, σ) ^ n
       end;

julia> obs = (y = [3.0, 4.0],);

julia> post = m(2) | obs
ModelPosterior given
    arguments    (:n,)
    observations (:y,)
@model n begin
        μ ~ Normal(σ = 10)
        σ ~ Exponential()
        y ~ Normal(μ, σ) ^ n
    end

julia> r = rand(post)
(μ = 4.99758715428609, σ = 1.582563929898804)

julia> predict(post, r)
(y = [3.6962760705758146, 3.661799255763297],)

I think this makes sense for this example. But what if you had, say y = [3.0, missing]? I think conditioning on missing should be the same as not conditioning at all. So for arrays, conditioning on y ought to really mean conditioning on the known elements of y, so rand should sample the others.

So this would look something like

julia> rand(m(2) | (y = [3.0, missing]))
(μ = 4.99758715428609, σ = 1.582563929898804, y = [3.0, 6.346809940724782])

But it seems weird to have a completely different return type when y is fully observed. So maybe the fully-observed case should instead work like this:

julia> rand(post)
(μ = 4.99758715428609, σ = 1.582563929898804, y = [3.0, 4.0])

This is equivalent to calling rand on the model where y ~ Normal(μ, σ) ^ n is replaced with y ~ Dirac([3.0, 4.0]), or could also be seen as calling rand on the original (unconditioned) model, and updating that result with any observations.

logdensity / rand interaction

This seems fine, but we need to be sure we have consistent semantics. Since rand and logdensityof are both defined for a post, we should be able to call

logdensityof(post, rand(post))

So… maybe the semantics are that logdensityof can be called without including Dirac components, and if they are given they must match what the model specifies. Does this make sense?

predict / rand interaction

Currently we can do

julia> predict(post, rand(post))
(y = [17.638620683052217, 17.891076168944927],)

What does this look like if rand also has a y slot? I think this could be something like

Among all sampled (~) variables in a posterior model post, predict(post, pars) samples those that are either (1) not included in pars or (2) are conditioned upon in post

Thoughts?

I think it’s important to get this right, so I’d appreciate any feedback on this. Thanks :slight_smile:

1 Like

"and if they are given they must match what the model specifies. "

Presumably this would throw an error if they don’t. It’s tempting to return -Inf in this case, but I can’t think of a real situation where people pass in the wrong y values and it’s not just an error.

I think rather than decide what happens if they don’t match, it’s helpful to ask what rand means on a posterior.

If it means “sample form the unconditioned model, except for observed values which are treated as Dirac” then the case where the input doesn’t match that needs to return a log-density of -Inf, to match Dirac semantics.

Maybe it should mean something different, but I don’t have any alternatives yet.

My (maybe bad) 2c as a casual onlooker but I was indeed curious what rand(post) meant, not knowing for Tilde and not being able to remember for other PPLs. It almost makes the most sense to me that it would mean a sample from the posterior, since I think of the conditioned posterior as some implicit distribution, and rand samples distributions. But ofc this requires sophisticated sampling. So one thing it could be is an error unless its trivially sampleable. Not sure if that makes any of these other downstream issues irrelevant.

I would tend to agree that calling rand() on the posterior makes me think of sampling from the posterior.

1 Like

This is a great point, makes me think we should just disallow rand on conditional measures. You can always get the prior and call rand on that if. The extra step is easy, and it makes things a lot clearer.

2 Likes

Not so much disallow but only define it if sampling the conditional is day O(1) and then rand should give such samples

@Tamas_Papp made a similar point a while back, and I agree. To try to nail this down a little, I think a sampling procedure should only be callable using rand if it’s

  1. “fast” in some sense ^\dagger
  2. exact
  3. iid

^\dagger though O(1) might be too restrictive, for example MvNormal requires matrix-vector multiply

1 Like