Main point of the model is, that I only observe sales, but want to do inference on lam and stocks.

Turing tells me that the last line is wrong, because I can only sample from Distributions (which of course makes sense).
I would have implemented the last line as sales ~ Delta(min(demand, stocks)), but there’s no such thing as a Delta distribution in Distributions.jl
Is there a way to express this in Turing?

Not really a Turing issue here, more a Distributions one. There appears to be an implementation of a Dirac distribution here but it seems to have been caught up.

I’m not sure I understand. What do you assume to be your observation model? If I’m not mistaken your observations are a deterministic transformation of your parameters. Is that correct?

I don’t think you are going to get reasonable inference results with a dirac delta.

This will likely not work as a Normal distribution with zero variance has infinite density. And so the joint log density will always be infinite, and HMC will not work.

If I’m not mistaken your observations are a deterministic transformation of your parameters. Is that correct?

Yes that’s correct, and I should have phrased my question accordingly:
How can I observe / condition on deterministic transformations of my random variables?

And I should have also mentioned:
I am not necessarily tied to HMC. VI would also work for me.
Though with this particular model already the forward pass fails:

prior = mymodel(missing)
prior()

raises ArgumentError: Right-hand side of a ~ must be subtype of Distribution or a vector of Distributions

It seems that your random variables stocks and demand are discrete and so are your sales. So maybe use a Categorical distribution for sales with a probability of 1 for one of the categories and 0 for all the others. Since you have discrete random variables, you should consider MH, PG or Gibbs, not pure HMC.

This particular case can be reduced to stocks ~ truncated(Poisson(lam), 0, 20), right? A uniform categorical doesn’t add anything.

To be able to use Turing (or rather, any MCMC algorithm) for sampling a requirement is that you can evaluate the logpdf pointwise, and so there won’t be a general way to handle what you’re suggesting. For that you have to look at something like approximate Bayesian inference (ABC). Hence, something like a Delta for continuous won’t work. For transformations for continuous variables, you have to look at Bijectors.jl which allows you to transform variables but using a restricted family of functions for which everything is well-defined.

For even more general (but discrete) cases than what I suggest in (1), you can probably get quite far by introducing a empirical distribution, i.e. something like

julia> using Distributions
julia> struct EmpiricalPDF{T} <: DiscreteUnivariateDistribution
xs::Vector{T}
end
julia> function Distributions.logpdf(d::EmpiricalPDF, x::Int)
return mean(x .== d.xs) # (inefficiently) count now many times the value occured
end
julia> stocks = rand(Categorical(fill(1, 20) ./ 20), 100);
julia> ys = rand(Poisson(1.), 100);
julia> dist = EmpiricalPDF(min.(xs, ys))
EmpiricalPDF{Int64}(
xs: [1, 3, 1, 2, 1, 0, 1, 0, 2, 1 … 0, 1, 1, 0, 2, 2, 2, 1, 1, 1]
)
julia> logpdf(dist, 1)
0.39
julia> logpdf(dist, 2)
0.18
julia> logpdf(dist, 3)
0.06

You’d have to reconstruct the distribution for every lam though, i.e. you’re Turing model would be something like

@model mymodel(sales) = begin
lam ~ Gamma(2, 3)
stocks ~ Categorical(ones(20) ./ 20)
demand ~ Poisson(lam)
demand_ = rand(Poisson(lam), 100);
stocks_ ~ Categorical(ones(20) ./ 20);
sales ~ EmpiricalPDF(min(demand_, stocks_)) # <= will end up calling `logpdf(EmpiricalPDF, sales)` which we can define
end

And this you can sample from using something like MH or PG:) This is quite hacky though, so a better way would be to derive the closed form distribution that results from taking min of the variables.

Wow, lots of useful answers. Thank you all for your time! I’ll try to answer in one post:

That does the trick for the forward pass (prior predictive), and even SMCdoes give a result even though I don’t know yet whether the resulting samples I obtain make sense (or could ever, based on what @trappmartin said).

Cool! I think that does exactly what I want! Thanks for the suggestion!

I think that would not be equivalent. In my model, stocks should be an unobserved discrete latent variable over which I want to marginalize. The maximum value of 20 is kind of an arbitrary choice that ideally should not have an impact on the model as long as 20 >> demand.
Truncating demand at 20 would be equivalent to observing stocks = 20.

I can’t really point my finger on it, but I think this requirement should still be satisfied by my particular model.

I now realize that a Delta distribution is not really the right thing for what I want here, since that is usually understood to live on a continuous domain, and my problem really boils down to observing on a deterministic transformation of discrete random variables.
I think what @mohamed82008 suggested is what I was looking for.

Even though this is OT: I’m kind of blown away by how engaged this community is! Thank you all!

–EDIT–

just posting the model with the suggestion from @mohamed82008 and for more than 1 observation:

@model mymodel(sales_p1, N) = begin
if sales_p1 === missing
sales_p1 = Vector{Float64}(undef, N)
end
lam ~ Gamma(2, 3)
for i in 1:length(sales_p1)
stocks_p1[i] ~ Categorical(ones(21) ./ 21) # stocks_p1 = stocks + 1
demand[i] ~ Poisson(lam)
p = zeros(21)
p[min(demand[i] + 1, stocks_p1[i])] = 1.
sales_p1[i] ~ Categorical(p) # sales_p1 = sales + 1
end
return lam, stocks_p1, demand, sales_p1
end