# Target variables are functions of Gaussians (aka how to define a deterministic multivariate distribution?)

I have a model where observations come from something of the form

``````@model function predictions(observations)
a ~ Normal(…)
for i in 1:lastindex(observations)
b ~ MvNormal(…) # has some non-trivial covariance matrix depending on len(observations[i]) (and a)
observations[i] ~ f(a,b) # f maps R^n -> [0,1]^n
end
end
``````

I think I should be able to do this by working out what the two-parameter family of Distributions induced by f is and implementing this & all the required helper functions as done here, but this seems unnecessarily complicated.
I tried `observations[i] ~ f(a,b)` and `g(observations[i],a) ~ MvNormal(…)` (where g( ., a) is the inverse of f for fixed a), but neither works. Then (inspired by this thread) I tried to implement a multivariate Deterministic distribution, but this didn’t work either.

Is there any chance to get this to work with existing tools? (I.e. without having to write a Distribution class from scratch.)

Welcome to the community

When you say “but neither works” can you explain what you mean by this? What code did you try to run and what was the error and stacktrace? Particularly the first snippet (`observations[i] ~ f(a,b)`) seems to be valid and I would expect this to work (unless I’m missing something fundamental).

`observations[i] ~ f(a,b)` gave the following error:
`ERROR: ArgumentError: the right-hand side of a `~` must be a `Distribution` or an array of `Distribution`s`

(If it matters, `f(a,b)` simply returns `cdf.(Normal(),b ./ √(1-sigmoid(a)))` for the vector b and the scalar a.)

What do you know about `f`?

@mschauer I just edited my previous reply to include `f` when you posted yours.

If you know g (inverse of f) just apply it to your data before passing it to Turing, then you can do

Except put the transformed data on the left

The inverse depends on `a`, a (random) parameter that also interacts with other quantities like the covariance matrix for `b`.

Gotcha… Hmm let me think about that…

Ok,

`@addlogprob!(logpdf(MvNormal(zeros(...),...),g(...)-stuff))`

Note that there is no Jacobian correction because you’re transforming the data which is only ever evaluated at fixed values.

It’s early in the morning so don’t take my word for that… See if you can work out whether there needs to be a correction for the dependence on a…

Is `f(a,b)` a CDF? (as a PDF it doesn’t necessarily need to be in [0,1])?
Why is `b` not defined outside loop, as it doesn’t depend on `observations`?

I think @dlakelan solution might work, just adding:

``````@addlogprob!(log(f(observation[i]; a=a, b=b))
``````

instead of using `observations[i]`.

Does this sound resonable?

2 Likes

Thank you for your replies! Unfortunately I’m afraid I don’t quite follow. The main problem is that I don’t really know (and couldn’t find a good source for) what’s going on under the hood in these models; in particular what this log probability is added to.

• `f` is a CDF in the sense that it outputs a vector of reals between 0 and 1, but these probabilities have nothing to do with how likely that observation is. My data just happens to be a list of lists of probabilities.
• `b` has the same dimensions as `observations[i]` which is a vector of length varying in `i`.

It isn’t so scary under the hood of these models. A model is a recipe for calculating the probability of seeing all the values, observations and parameters of the model. The log-probability being equivalent to the probability (under log). Later, an inference engine can try to find the parameters which maximize this join probability, or sample from marginals of this joint probability.

It would be nice to see an example of `f(a,b)` and an example of an `observation[1]` list. The elements of `observation[i]` are probabilities, but are they independant of each other given `f(a,b)`?

Perhaps you should go for a model which looks like:

``````  a ~ Normal(…)
for i in 1:lastindex(observations)
b ~ MvNormal(…) # has some non-trivial covariance matrix depending on len(observations[i]) (and a)
for j in 1:length(observations[i])
observation[i][j] ~ f(a,b,j)
end
end
``````

If `f` is a CDF, then its range should be `[0,1]` and not `[0,1]^n`. But maybe I’m somehow confused (perhaps `f` is what-is-called a copula).

Virtually the entire purpose of a Turing model is to figure out how to calculate `log(pdf(YourJointDistribution)) + C`

In Turing, in order to put something on the RHS of a `~` it needs to be a Struct that is a subtype of some kind of Distribution.

``````supertype(typeof(Normal(0,1)))
Distribution{Univariate, Continuous}
``````

If you don’t want to go to the whole trouble of defining a distribution, then you can just add some values to the log probability density via `@addlogprob!(...)` and it will include them in the calculation of the log probability density for your model.

@Dan: `observations` could be something like `[[.8,.9,.6], [.1,.2]]`. It shouldn’t have anything to do with copulas or the like since it’s really just a collection of subjective probabilities submitted by a varying number of users (indexed by `j` in your code) for different questions (indexed by `i`).

When you say that an inference engine tries to find argmax(logpdf), does this mean it tries to compute the MLE? I was under the impression there’d be some more sophisticated version of a Metropolis algorithm or the like, not gradient descent.

@dlakelan
Ok, I think I’m starting to understand what addlogprob does now—just to be sure, is it correct to say that

``````@model function test(y)
x ~ Normal()
y ~ Normal(x,1)
end
``````

is completely equivalent to

``````@model function test(y)
x ~ Normal()
end
``````

? (At least for a single number `y` they seem to give the ≈same result.)
If that’s the case, I can take it from here.

Thanks for the extra information, it really helps putting things in context.

Given the context, it is a fair modeling decision to assume people’s answers to questions are not influencing one another (unless they are sequentially answering outloud in the same room or some such scenario). So, the distribution of each `observation[i][j]` is distributed the same as `observation[i][j']`, and they are independent (given the question).

A model close to the suggested one could be:

``````a ~ Normal(…)
for i in 1:lastindex(observations)
b ~ MvNormal(…) # has some non-trivial covariance matrix depending on len(observations[i]) (and a)
for j in 1:length(observations[i])
d = Beta(f1(a,b), f2(a,b))
observation[i][j] ~ d
end
end
``````

This should be a valid model. `f1` and `f2` can influence the observation distribution in the usual way of Beta distributions (i.e. more evidence towards `p = 0.0` will have `f1(.)` >> `f2(.)` etc).

This is a nice model, and with regards to copulas I’ve mentioned, it seems to me this is exactly the way copulas generate correlated random variables (it might be worth a try to just use some Copula package straight out).

More or less yes.

@Dan: Technically speaking, the model models predictions as exchangeable and something that’s “correlated with the truth”. So predictions will have to have pairwise correlations because of that (in general).

I think I managed to make it work with

``````@model function predictions(observations)
a ~ Normal(…)
for i in 1:lastindex(observations)