I’ve never used Turing before and don’t have a lot of statistics knowledge but I’d love to be able to do this kind of thing anyway:

I don’t have any data. I only want to compute the expectation value of a somewhat complicated random variable.

Suppose you have N (e.g. 800_000) things, each of which is “a failure” with (i.i.d) probability Q (e.g. 5%) and otherwise “a success”. Call the total number of failures K. (I write `K ~ Binomial(N, Q)`

) Suppose I draw a sub-sample of size n (e.g. 10_000) and count the number failures in the sample, call it k. (I write `k ~ Hypergeometric(K, N-K, n)`

, which is now a compound probability distribution).

Suppose I repeat this a million times. Is Turing.jl a good tool for getting samples for (K, k) even though I don’t want inference? I tried

```
using Turing, StatsPlots, Random
@model function gdemo()
N = 800_000
Q = 0.05
n = 10_000
K ~ Binomial(N, Q)
k ~ Hypergeometric(K, N-K, n)
end
# Chose some arbitrary sampler...
my_sample = sample(gdemo(), SMC(), 1_000_000)
```

This runs (single-threaded) and produces some samples. **However, this takes an hour on my laptop. Is that normal or am I doing it wrong?**

As a sanity check, I expect the distribution of k/n to be very close to `Normal(Q, sqrt(Q*(1-Q) / n))`

, which looks ok:

```
mean(my_sample[:k] / 10_000) # 0.0500006641
std(my_sample[:k] / 10_000) # 0.002180021202067455
sqrt(0.05*(1-0.05) / 10_000) # 0.0021794494717703367
```

However, does the way I’m using Turing (with this or some other sampler) get me *closer* to the true distribution than this Normal approximation? I also want to try different combinations of N, n, Q where the approximation may be bad.

As the next step in the process I’m modeling, after drawing my sub-sample and counting failures `k`

, I build a discrete random variable C (which can e.g. take values 0…10). I have functions f_0, \dots, f_{10} of two arguments (“like K and k”), which are pure and fast to evaluate but have no analytic form (some array look-ups with data). Define

Finally, I want to find an “optimal sub-sample size `n`

”:

Note that C is also a random variable, so “each sample has to use its own f_C”. However, if I put something like `C = argmax(... k ... K ...)`

into the model definition, C isn’t a random variable in the model and I haven’t seen anything in the Turing guide hinting at how to do this. **How can I get the expectation value that depends on C?** I don’t necessarily need to *really do* the argmax over `n`

, maybe just trying a few values an picking the best is fine.

Is it reasonable to do this with Turing.jl? I’m not quite sure yet how to approach it