Target variables are functions of Gaussians (aka how to define a deterministic multivariate distribution?)

I have a model where observations come from something of the form

@model function predictions(observations)
  a ~ Normal(…)
  for i in 1:lastindex(observations)
    b ~ MvNormal(…) # has some non-trivial covariance matrix depending on len(observations[i]) (and a)
    observations[i] ~ f(a,b) # f maps R^n -> [0,1]^n
  end
end

I think I should be able to do this by working out what the two-parameter family of Distributions induced by f is and implementing this & all the required helper functions as done here, but this seems unnecessarily complicated.
I tried observations[i] ~ f(a,b) and g(observations[i],a) ~ MvNormal(…) (where g( ., a) is the inverse of f for fixed a), but neither works. Then (inspired by this thread) I tried to implement a multivariate Deterministic distribution, but this didn’t work either.

Is there any chance to get this to work with existing tools? (I.e. without having to write a Distribution class from scratch.)

Welcome to the community :slight_smile:

When you say “but neither works” can you explain what you mean by this? What code did you try to run and what was the error and stacktrace? Particularly the first snippet (observations[i] ~ f(a,b)) seems to be valid and I would expect this to work (unless I’m missing something fundamental).

Thanks for the quick reply!

observations[i] ~ f(a,b) gave the following error:
ERROR: ArgumentError: the right-hand side of a `~` must be a `Distribution` or an array of `Distribution`s

(If it matters, f(a,b) simply returns cdf.(Normal(),b ./ √(1-sigmoid(a))) for the vector b and the scalar a.)

What do you know about f?

@mschauer I just edited my previous reply to include f when you posted yours.

If you know g (inverse of f) just apply it to your data before passing it to Turing, then you can do

Except put the transformed data on the left

The inverse depends on a, a (random) parameter that also interacts with other quantities like the covariance matrix for b.

Gotcha… Hmm let me think about that…

Ok,

@addlogprob!(logpdf(MvNormal(zeros(...),...),g(...)-stuff))

Note that there is no Jacobian correction because you’re transforming the data which is only ever evaluated at fixed values.

It’s early in the morning so don’t take my word for that… See if you can work out whether there needs to be a correction for the dependence on a…

Is f(a,b) a CDF? (as a PDF it doesn’t necessarily need to be in [0,1])?
Why is b not defined outside loop, as it doesn’t depend on observations?

I think @dlakelan solution might work, just adding:

@addlogprob!(log(f(observation[i]; a=a, b=b))

instead of using observations[i].

Does this sound resonable?

2 Likes

Thank you for your replies! Unfortunately I’m afraid I don’t quite follow. The main problem is that I don’t really know (and couldn’t find a good source for) what’s going on under the hood in these models; in particular what this log probability is added to.

Regarding your questions:

  • f is a CDF in the sense that it outputs a vector of reals between 0 and 1, but these probabilities have nothing to do with how likely that observation is. My data just happens to be a list of lists of probabilities.
  • b has the same dimensions as observations[i] which is a vector of length varying in i.

It isn’t so scary under the hood of these models. A model is a recipe for calculating the probability of seeing all the values, observations and parameters of the model. The log-probability being equivalent to the probability (under log). Later, an inference engine can try to find the parameters which maximize this join probability, or sample from marginals of this joint probability.

It would be nice to see an example of f(a,b) and an example of an observation[1] list. The elements of observation[i] are probabilities, but are they independant of each other given f(a,b)?

Perhaps you should go for a model which looks like:

  a ~ Normal(…)
  for i in 1:lastindex(observations)
    b ~ MvNormal(…) # has some non-trivial covariance matrix depending on len(observations[i]) (and a)
    for j in 1:length(observations[i])
        observation[i][j] ~ f(a,b,j)
    end
  end

If f is a CDF, then its range should be [0,1] and not [0,1]^n. But maybe I’m somehow confused (perhaps f is what-is-called a copula).

Virtually the entire purpose of a Turing model is to figure out how to calculate log(pdf(YourJointDistribution)) + C

In Turing, in order to put something on the RHS of a ~ it needs to be a Struct that is a subtype of some kind of Distribution.

supertype(typeof(Normal(0,1)))
Distribution{Univariate, Continuous}

If you don’t want to go to the whole trouble of defining a distribution, then you can just add some values to the log probability density via @addlogprob!(...) and it will include them in the calculation of the log probability density for your model.

@Dan: observations could be something like [[.8,.9,.6], [.1,.2]]. It shouldn’t have anything to do with copulas or the like since it’s really just a collection of subjective probabilities submitted by a varying number of users (indexed by j in your code) for different questions (indexed by i).

When you say that an inference engine tries to find argmax(logpdf), does this mean it tries to compute the MLE? I was under the impression there’d be some more sophisticated version of a Metropolis algorithm or the like, not gradient descent.

@dlakelan
Ok, I think I’m starting to understand what addlogprob does now—just to be sure, is it correct to say that

@model function test(y)
    x ~ Normal()
    y ~ Normal(x,1)
end

is completely equivalent to

@model function test(y)
    x ~ Normal()
    Turing.@addlogprob!(logpdf(Normal(x,1), y))
end

? (At least for a single number y they seem to give the ≈same result.)
If that’s the case, I can take it from here.

Thanks for the extra information, it really helps putting things in context.

Given the context, it is a fair modeling decision to assume people’s answers to questions are not influencing one another (unless they are sequentially answering outloud in the same room or some such scenario). So, the distribution of each observation[i][j] is distributed the same as observation[i][j'], and they are independent (given the question).

A model close to the suggested one could be:

a ~ Normal(…)
  for i in 1:lastindex(observations)
    b ~ MvNormal(…) # has some non-trivial covariance matrix depending on len(observations[i]) (and a)
    for j in 1:length(observations[i])
        d = Beta(f1(a,b), f2(a,b))
        observation[i][j] ~ d 
    end
  end

This should be a valid model. f1 and f2 can influence the observation distribution in the usual way of Beta distributions (i.e. more evidence towards p = 0.0 will have f1(.) >> f2(.) etc).

This is a nice model, and with regards to copulas I’ve mentioned, it seems to me this is exactly the way copulas generate correlated random variables (it might be worth a try to just use some Copula package straight out).

More or less yes.

@Dan: Technically speaking, the model models predictions as exchangeable and something that’s “correlated with the truth”. So predictions will have to have pairwise correlations because of that (in general).

I think I managed to make it work with

@model function predictions(observations)
  a ~ Normal(…)
  for i in 1:lastindex(observations)
    Turing.@addlogprob!( logpdf(MvNormal(…), f(a, observations[i])) )
  end
end

Thanks again for all your help!

1 Like