I’m working on a project where I need to compute the weighted histogram of something many times. Basically I need the sum of each of the bins of the histogram, and I’ve been using PyCall to call the SciPy “binned statistic” method until this point as there wasn’t an obvious equivalent I could find in Julia.

While this is fine, it’s always annoyed me, and I recently decided I could just write it myself in native Julia and stop being annoyed by it. So I took this naive stab at it, which works, but is twice as slow as calling the Python version (~30 vs. ~60 ms):

```
function histSum(x::Array{Float64,},y::Array{Float64,},bins::Int=100) #61 ms
x = vec(x); y = vec(y)
binMax = maximum(x); binMin = minimum(x)
binEdges = range(binMin,stop=binMax,length=bins+1)
result = zeros(length(binEdges)-1)
for i=1:bins-1
result[i] = sum(y[(x .>= binEdges[i]) .& (x .< binEdges[i+1])])
end
result[end] = sum(y[(x .>= binEdges[end-1]) .& (x .<= binEdges[end])])
return binEdges, result
end
```

I posted the problem on the Julia slack and there were some great suggestions from Dave MacMahon, including this much more elegant solution that uses the built-in StatsBase histogram function:

```
using StatsBase
function histSum(x::Array,y::Array,bins::Int=100)
h = fit(Histogram, vec(x), aweights(y), nbins=bins)
h.edges, h.weights
end
```

But this way is nearly identical in speed to my first naive way, so still twice as slow as using PyCall. The relevant Python source code for binned_statistic is here, which in the summation case relies on the numpy bincount function which is calling some C code I can’t find.

One thought is that maybe the numpy version is automatically multi-threaded behind the scenes and this is responsible for the discrepancy? If I place Threads.@threads in front of the for loop in my first version of the code I can beat the Python version by a few ms. Any suggestions much appreciated!