We have many packages to fit distributions / mixture of them over data with MLE or Chi2 methods. But what tools to use right now if my data is already binned. I.e.
using Distributions, StatsBase
n = Normal()
u = Uniform(-1,1)
s() = rand() < 0.8 ? rand(n) : rand(u)
data = [s() for _=1:10^5]
binned = fit(Histogram, data)
Let’s say, the goal is to find 0.8 to be the most likely weight assuming we know the center of the normal and the range of the uniform.
One idea is of course using curve fitting such as LsqFit and treat the PDF as a curve and bin position-bin height as x,y. Or to sample the histogram to “recover” underlying data?
What would be a ready-made way in Julia’s ecosystem?
turns out a curve fit with LsqFit would kind of work:
using Distributions, LsqFit, FHist
n = Normal(0, 2)
u = Uniform(-1, 1)
data = rand(n, 8000) # 80% distributed as gaussian
data = vcat(data, rand(u, 2000)) # 20% as flat background
h = Hist1D(data, -3:0.3:3);
xs = FHist.bincenters(h)
ys = h.hist.weights;
# let's assume we know the center of normal distribution is 0
# and we know uniform ranges from -1 to 1
# p[1] is sigma of Normal
# p[2] is strength for Normal, Uniform will have (1-p[2])
# p[3] is overall scale
mypdfN(x, sig) = pdf(Normal(0, sig), x)
mypdfU(x) = pdf(Uniform(-1, 1), x)
@. mypdf(x, p) = (mypdfN(x, p[1])*p[2] + (1-p[2]) * mypdfU(x))*p[3]
p0 = [3, 0.5, 10^4] # initial guess, 10^5 is basically our normalization
lb = [0.1, 0., 0.] # lower bounds because some parameter can't go negative
myfit = curve_fit(mypdf, xs, ys, p0; lower=lb);