I have a model where observations come from something of the form
@model function predictions(observations)
a ~ Normal(…)
for i in 1:lastindex(observations)
b ~ MvNormal(…) # has some non-trivial covariance matrix depending on len(observations[i]) (and a)
observations[i] ~ f(a,b) # f maps R^n -> [0,1]^n
end
end
I think I should be able to do this by working out what the two-parameter family of Distributions induced by f is and implementing this & all the required helper functions as done here, but this seems unnecessarily complicated.
I tried observations[i] ~ f(a,b) and g(observations[i],a) ~ MvNormal(…) (where g( ., a) is the inverse of f for fixed a), but neither works. Then (inspired by this thread) I tried to implement a multivariate Deterministic distribution, but this didn’t work either.
Is there any chance to get this to work with existing tools? (I.e. without having to write a Distribution class from scratch.)
When you say “but neither works” can you explain what you mean by this? What code did you try to run and what was the error and stacktrace? Particularly the first snippet (observations[i] ~ f(a,b)) seems to be valid and I would expect this to work (unless I’m missing something fundamental).
observations[i] ~ f(a,b) gave the following error: ERROR: ArgumentError: the right-hand side of a `~` must be a `Distribution` or an array of `Distribution`s
(If it matters, f(a,b) simply returns cdf.(Normal(),b ./ √(1-sigmoid(a))) for the vector b and the scalar a.)
Thank you for your replies! Unfortunately I’m afraid I don’t quite follow. The main problem is that I don’t really know (and couldn’t find a good source for) what’s going on under the hood in these models; in particular what this log probability is added to.
Regarding your questions:
f is a CDF in the sense that it outputs a vector of reals between 0 and 1, but these probabilities have nothing to do with how likely that observation is. My data just happens to be a list of lists of probabilities.
b has the same dimensions as observations[i] which is a vector of length varying in i.
It isn’t so scary under the hood of these models. A model is a recipe for calculating the probability of seeing all the values, observations and parameters of the model. The log-probability being equivalent to the probability (under log). Later, an inference engine can try to find the parameters which maximize this join probability, or sample from marginals of this joint probability.
It would be nice to see an example of f(a,b) and an example of an observation[1] list. The elements of observation[i] are probabilities, but are they independant of each other given f(a,b)?
Perhaps you should go for a model which looks like:
a ~ Normal(…)
for i in 1:lastindex(observations)
b ~ MvNormal(…) # has some non-trivial covariance matrix depending on len(observations[i]) (and a)
for j in 1:length(observations[i])
observation[i][j] ~ f(a,b,j)
end
end
If f is a CDF, then its range should be [0,1] and not [0,1]^n. But maybe I’m somehow confused (perhaps f is what-is-called a copula).
If you don’t want to go to the whole trouble of defining a distribution, then you can just add some values to the log probability density via @addlogprob!(...) and it will include them in the calculation of the log probability density for your model.
@Dan: observations could be something like [[.8,.9,.6], [.1,.2]]. It shouldn’t have anything to do with copulas or the like since it’s really just a collection of subjective probabilities submitted by a varying number of users (indexed by j in your code) for different questions (indexed by i).
When you say that an inference engine tries to find argmax(logpdf), does this mean it tries to compute the MLE? I was under the impression there’d be some more sophisticated version of a Metropolis algorithm or the like, not gradient descent.
@dlakelan
Ok, I think I’m starting to understand what addlogprob does now—just to be sure, is it correct to say that
@model function test(y)
x ~ Normal()
y ~ Normal(x,1)
end
is completely equivalent to
@model function test(y)
x ~ Normal()
Turing.@addlogprob!(logpdf(Normal(x,1), y))
end
? (At least for a single number y they seem to give the ≈same result.)
If that’s the case, I can take it from here.
Thanks for the extra information, it really helps putting things in context.
Given the context, it is a fair modeling decision to assume people’s answers to questions are not influencing one another (unless they are sequentially answering outloud in the same room or some such scenario). So, the distribution of each observation[i][j] is distributed the same as observation[i][j'], and they are independent (given the question).
A model close to the suggested one could be:
a ~ Normal(…)
for i in 1:lastindex(observations)
b ~ MvNormal(…) # has some non-trivial covariance matrix depending on len(observations[i]) (and a)
for j in 1:length(observations[i])
d = Beta(f1(a,b), f2(a,b))
observation[i][j] ~ d
end
end
This should be a valid model. f1 and f2 can influence the observation distribution in the usual way of Beta distributions (i.e. more evidence towards p = 0.0 will have f1(.) >> f2(.) etc).
This is a nice model, and with regards to copulas I’ve mentioned, it seems to me this is exactly the way copulas generate correlated random variables (it might be worth a try to just use some Copula package straight out).
@Dan: Technically speaking, the model models predictions as exchangeable and something that’s “correlated with the truth”. So predictions will have to have pairwise correlations because of that (in general).
I think I managed to make it work with
@model function predictions(observations)
a ~ Normal(…)
for i in 1:lastindex(observations)
Turing.@addlogprob!( logpdf(MvNormal(…), f(a, observations[i])) )
end
end