# Use a Histogram as a function

I would like to see built-in functionality that allows me to use a Histogram as a function that returns the weight of the histogram at a given point in state space. The closest thing i can find is the `kde()` function from `KernelDensity.jl` which produces a distribution, say `p` , from data `d` via `p=kde(d)` . The distribution can then be evaluated at a point `x` in state space via `pdf(p,x)` . However, `kde()` takes too long for my purposes, but `h=fit(Histogram,d)` is rather quick in comparison. Furthermore, my state space is multivariate, so `kde()` will only let me approximate the marginal distributions while `fit(Histogram,d)` lets me approximate multivariate distributions. It would be great if i could take `h` and compute the weight at `x` via something like `pdf(h,x)` . I understand Histograms come with edges and weights, so all the necessary ingredients are there.

The reason i want this is for fast Bayesian inference. I’m using an MCMC-like approach to approximate the likelihood surface. Instead of producing a function that approximates the likelihood, this approach produces a sample whose histogram is proportional to an approximation of the likelihood.

You might be interested in my answer here: https://stackoverflow.com/questions/58336730/return-the-frequency-in-a-bin-of-a-2d-histogram-in-julia/58337134?noredirect=1#comment103041577_58337134

If you want to make this built-in probably best to do a PR to StatsBase

3 Likes

Thanks for the quick reply. Your `get_freq()` function looks like exactly what I need. Do you think this is the fastest way to perform this task in julia?

How do i do a PR to StatsBase?

Not sure it’s the fastest, but it’s surely orders of magnitude faster than what you were doing:

``````using StatsBase, BenchmarkTools

# Example data
data = (randn(10_000), randn(10_000))

h2d = fit(Histogram, data)

function get_freq(h, xval, yval)
x = searchsortedfirst(h.edges, xval)
y = searchsortedfirst(h.edges, yval)
h.weights[x, y]
end

@btime fit(Histogram, data) # 929.900 μs (3 allocations: 1.47 KiB)

@btime get_freq(h2d, 1.4, 0.6) #  70.832 ns (0 allocations: 0 bytes)

using KernelDensity

p = kde(data)

@btime kde(data) # 4.011 ms (166 allocations: 2.67 MiB)

@btime pdf(p, 1.4, 0.6) # 1.717 ms (208 allocations: 1.67 MiB)
``````

Maybe try and profile this within your actual application, if it turns out to be a bottleneck I’m sure there are people on this forum that will manage to beat my suggestion quite handily As for the PR, maybe start off by creating an issue on the StatsBase repo to see what the maintainers think about having this functionality, how it should integrate into the existing API etc.

2 Likes