I have the following type of problem: imagine a Monte-Carlo simulation generating some data, and I want to estimate the density function for that simulation. I cannot store the data for each Monte-Carlos step, which would be prohibitive, but I want to do something better than simply constructing a histogram.
To provide a concrete example:
- In theory a nice way to obtain the density would be:
julia> using Plots, KernelDensity julia> data = randn(10^4); julia> k = kde(data); julia> plot(k.x, k.density)
which gives me the pretty density:
- However, I cannot really store the data. So my approach is to build an histogram on the flight:
julia> function mc_histogram(nsteps) hist = zeros(100) limits = (-5,5) domain = range(limits, limits; length=100) n = 0 for _ in 1:nsteps x = randn() if limits < x < limits n += 1 bin = floor(Int, 100*(x - limits) / (limits-limits)) + 1 hist[bin] += 1 end end return domain, hist ./ n end mc_histogram (generic function with 1 method) julia> x, h = mc_histogram(10^4); julia> plot(x,h)
Histograms, though, are ugly:
Thus, what I’m searching for is a way to, without storing the data, build a kernel density estimate that will be a better estimate of the density than the histogram.
One alternative I was told was the use of average shifted histograms (which this package implements). That strategy seems adequate, though I’m unsure how more justified it is relative to just computing a running average of other smoothing function on the final discrete histogram.
I’m quite unfamiliar with the world of these kernel estimators, so I’m looking for alternatives. Any ideas, suggestions? Are the average shifted histograms the best alternative out there?
The alternative with
julia> function mc_ash(nsteps) a = ash(randn(); rng=range(-5, 5, length=100)) for _ in 1:nsteps x = randn() ash!(a, x) end return a end mc_ash (generic function with 1 method) julia> a = mc_ash(10^4); julia> plot(a.density)
Which doesn’t look much smoother than the histogram (there are parameters to adjust, though), but at least allows the incremental improvement of the estimates.