Well, instead of trying to forecast the sum of a bunch of things (Which is relatively more easy) you are trying to forecast each of the individual things (which is harder!).
Here’s a quick example using Turing and your test scores idea. Let’s say a teacher gave a particular exam to 200 students total over the course of last year. This year, the teacher wants to estimate what the year-end distribution of test scores is likely to be, given last year’s data and the test scores for 10 students that have taken the exam so far this year.
using Distributions
using MCMCChains
using StatsPlots
using Turing
d_old = Normal(70.0, 25.0)
d_new = Normal(78.0, 20.0) # this year's class is smarter
scores_old = rand(d_old, 200)
prior = fit(Normal, scores_old)
@model function test_scores(scores)
μ ~ Normal(prior.μ, 25)
σ ~ Uniform(0, 50)
scores ~ Normal(μ, σ)
end
N = 10
scores_new = rand(d_new, N)
chain = sample(test_scores(scores_new), NUTS(), 1000)
μ_summary = chain[:μ]
σ_summary = chain[:σ]
plot(
prior,
label="Prior",
xlabel="Score",
ylabel="Density",
title="N = $N \n CI: ($(Int64(round(quantile(vec(μ_summary), 0.025)))), $(Int64(round(quantile(vec(μ_summary), 0.975)))))"
)
vline!([prior.μ], label="Prior μ")
plot!(Normal(mean(μ_summary), mean(σ_summary)), label="Posterior")
vline!([mean(μ_summary)], label="Posterior μ")
If we had 100 scores from this year’s class rather than 10:
As you can see, the credible interval is much smaller when we have more data.
Hi @mthelm85 thank you so very much for putting in the effort to create an example. Much obliged. I’ll be studying this later this evening.
I had some doubts raising questions in this community (being green and all), but am very happy I did, thanks again for raising the bar on being helpful! :happy-camper: Cheers!
I’ve looked at your example and tried it out. I hope you don’t mind me asking some follow up questions.
- I’m wondering why a
fit
for the prior was chosen (as opposed to just taking70
from d_old). And in the same regard: why only use theprior.μ
in the model, as opposed to using the fitted σ : ie:μ ~ Normal(prior.μ,prior.σ)
- What’s the reasoning behing choosing a
Uniform(0,50)
distribution for σ in the model? I think I understand why it is uniform, but wouldn’t it be a more accurate model if a different interval was chosen? (ie: 15,35)
There were a few things I didn’t understand at first glance, so I reiterated over different values of N
to get a better grasp of what what happening. I’ve made a plot to better see what was going on (TIL). Maybe it’s useful for other people following this discussion-thread as well.
NOTE: The fitted prior was constant in every iteration, the scores_new
were resampled.
Plot showing different scores_new
samples with N=10
Plot showing different scores_new
samples with N=100
Plot showing different scores_new
samples with N=1000
One final thing I was wondering about looking at the graphs is how to compute when N
is sufficiently high (ie: wigglyness is low), would you only take CI
under consideration or are there other measures one can use.
Thank you again for showing me all this and taking your time to provide examples, I understand you might have better things to do than answering my questions, no pressure
Those are excellent questions and the answer to both of them is mostly that the example was thrown together quickly/sloppily
In this case, d_old
and d_new
are the true underlying distributions which I almost never know in practice. I generated samples from d_old
and sort of assumed that would be the starting point in a real-world scenario (you observe some data) but then you might fit a distribution to that data. In this example, of course, there was no reason to do that since we have the actual distribution. In the model, we are saying we want to draw the parameter μ
from a normal distribution that has the mean μ
and, in this case, an arbitrarily chosen standard deviation. I think the choice of standard deviation in the line μ ~ Normal(prior.μ, 25)
is a reflection of how confident you are that the mean for the new distribution could/should be close to the mean of the old distribution.
This was arbitrary. The interpretation of this choice is that I’m saying all values between 0 and 50 are equally likely, i.e., it’s expressing a high degree of uncertainty regarding σ
. If you wanted to express total uncertainty, you could do Uniform(0, 100)
, but in this case we can compute the standard deviation of our observed data, so maybe we could have done σ ~ truncated(Normal(prior.σ, 10), 0, 50)
, or something like that.
Finally, I would highly recommend you check up on all of this (maybe with some of the resources I linked above) because I am nowhere near an expert/authority on this topic, but I’m glad to help out! There are some legitimate subject matter experts that participate regularly in this forum, so if we’re lucky one of them will chime in
Happy to report that your answers to my questions were in line with what I was intuitively thinking. Nonetheless the example you provided gave me a lot of insight! So thank you once more.
I’ve been wondering how to clip distributions and didn’t know about truncated
. Nice to see it show up here!
I’ll be checking those links out as well.