I have the following type of problem: imagine a Monte-Carlo simulation generating some data, and I want to estimate the density function for that simulation. I cannot store the data for each Monte-Carlos step, which would be prohibitive, but I want to do something better than simply constructing a histogram.
To provide a concrete example:
- In theory a nice way to obtain the density would be:
julia> using Plots, KernelDensity
julia> data = randn(10^4);
julia> k = kde(data);
julia> plot(k.x, k.density)
which gives me the pretty density:
- However, I cannot really store the data. So my approach is to build an histogram on the flight:
julia> function mc_histogram(nsteps)
hist = zeros(100)
limits = (-5,5)
domain = range(limits[1], limits[2]; length=100)
n = 0
for _ in 1:nsteps
x = randn()
if limits[1] < x < limits[2]
n += 1
bin = floor(Int, 100*(x - limits[1]) / (limits[2]-limits[1])) + 1
hist[bin] += 1
end
end
return domain, hist ./ n
end
mc_histogram (generic function with 1 method)
julia> x, h = mc_histogram(10^4);
julia> plot(x,h)
Histograms, though, are ugly:
Thus, what I’m searching for is a way to, without storing the data, build a kernel density estimate that will be a better estimate of the density than the histogram.
One alternative I was told was the use of average shifted histograms (which this package implements). That strategy seems adequate, though I’m unsure how more justified it is relative to just computing a running average of other smoothing function on the final discrete histogram.
I’m quite unfamiliar with the world of these kernel estimators, so I’m looking for alternatives. Any ideas, suggestions? Are the average shifted histograms the best alternative out there?
The alternative with AverageShiftedHistograms
gives:
julia> function mc_ash(nsteps)
a = ash(randn(); rng=range(-5, 5, length=100))
for _ in 1:nsteps
x = randn()
ash!(a, x)
end
return a
end
mc_ash (generic function with 1 method)
julia> a = mc_ash(10^4);
julia> plot(a.density)
Which doesn’t look much smoother than the histogram (there are parameters to adjust, though), but at least allows the incremental improvement of the estimates.