Weighted histogram ~2x as slow in Julia vs. Python

kirklong · January 20, 2022, 6:55pm

I’m working on a project where I need to compute the weighted histogram of something many times. Basically I need the sum of each of the bins of the histogram, and I’ve been using PyCall to call the SciPy “binned statistic” method until this point as there wasn’t an obvious equivalent I could find in Julia.

While this is fine, it’s always annoyed me, and I recently decided I could just write it myself in native Julia and stop being annoyed by it. So I took this naive stab at it, which works, but is twice as slow as calling the Python version (~30 vs. ~60 ms):

function histSum(x::Array{Float64,},y::Array{Float64,},bins::Int=100) #61 ms
    x = vec(x); y = vec(y)
    binMax = maximum(x); binMin = minimum(x)
    binEdges = range(binMin,stop=binMax,length=bins+1)
    result = zeros(length(binEdges)-1)
    for i=1:bins-1
        result[i] = sum(y[(x .>= binEdges[i]) .& (x .< binEdges[i+1])])
    end
    result[end] = sum(y[(x .>= binEdges[end-1]) .& (x .<= binEdges[end])])
    return binEdges, result
end

I posted the problem on the Julia slack and there were some great suggestions from Dave MacMahon, including this much more elegant solution that uses the built-in StatsBase histogram function:

using StatsBase
function histSum(x::Array,y::Array,bins::Int=100)
    h = fit(Histogram, vec(x), aweights(y), nbins=bins)
    h.edges, h.weights
end

But this way is nearly identical in speed to my first naive way, so still twice as slow as using PyCall. The relevant Python source code for binned_statistic is here, which in the summation case relies on the numpy bincount function which is calling some C code I can’t find.

One thought is that maybe the numpy version is automatically multi-threaded behind the scenes and this is responsible for the discrepancy? If I place Threads.@threads in front of the for loop in my first version of the code I can beat the Python version by a few ms. Any suggestions much appreciated!

DNF · January 20, 2022, 7:10pm

When you have performance issues, looking for unnecessary allocations is often a good idea. In this line you are creating one BitArray and one regular array for each bin.

If you can write this out using loop, using only scalar operations, there should be a decent speedup, I think. Maybe for each element search for the right bin using one of the sorted search functions.

kirklong · January 20, 2022, 7:18pm

This was a suggestion I got on the Slack as well (to try sorting instead) and we came up with this implementation:

function histSumSort(x::Array{Float64,},y::Array{Float64,},bins::Int=100) #87 ms
    x = vec(x); y = vec(y)
    p = sortperm(x)
    #sortedX = x[p]; sortedY = y[p]
    bins = range(minimum(x),stop=maximum(x),length=bins+1)
    I = 1
    result = zeros(length(bins)-1)
    for i in p
        while x[i] > bins[I+1]
            I += 1
        end
        result[I] += y[i]
    end
    return bins,result
end

Unfortunately this way is actually slower (by about 1/3) than my initial naive guess…but maybe you can think of a better way to write this?

GunnarFarneback · January 20, 2022, 7:19pm

Try this:

function histSum(x::Array{Float64,}, y::Array{Float64,}, bins::Int=100)
    binMin, binMax = extrema(x)
    result = zeros(bins)
    α = bins / (binMax - binMin)
    for (x, y) in zip(x, y)
        i = min(bins, 1 + floor(Int, α * (x - binMin)))
        result[i] += y
    end
    return range(binMin, stop=binMax, length=bins+1), result
end

kirklong · January 20, 2022, 7:24pm

This does it! What about this makes it so much faster? I notice if I place the @threads macro in front of this for loop I don’t get any speedup, and the time it takes is basically identical to my first way with the @threads macro, so I’m wondering if somehow this formulation is by default multi-threaded or something?

GunnarFarneback · January 20, 2022, 7:28pm

Since the bins are uniform you can directly compute into which bin each element go, which is much faster than searching for the bin.

kirklong · January 20, 2022, 7:29pm

Thanks so much!

DNF · January 20, 2022, 7:35pm

I didn’t actually mean that you should sort, but exploit that the bins are sorted. @GunnarFarneback’s solution is an even faster version of that.

goerch · January 20, 2022, 7:47pm

GunnarFarneback:

function histSum(x::Array{Float64,}, y::Array{Float64,}, bins::Int=100)
    binMin, binMax = extrema(x)
    result = zeros(bins)
    α = bins / (binMax - binMin)
    for (x, y) in zip(x, y)
        i = min(bins, 1 + floor(Int, α * (x - binMin)))
        result[i] += y
    end
    return range(binMin, stop=binMax, length=bins+1), result
end

Interestingly, changing extrema back to minimum and maximum seems quite a bit faster on my system (Julia 1.7.1)? Otherwise very nice!

jling · January 20, 2022, 8:13pm

https://github.com/Moelf/FHist.jl

give this a try if you want

albheim · January 20, 2022, 9:19pm

I can see a similar difference here, also 1.7.1. Seems like something must be wrong with the extrema implementation?

julia> a = randn(1000);

julia> @btime extrema($a)
  5.849 μs (0 allocations: 0 bytes)
(-3.155119094596879, 3.0526046427540967)

julia> @btime minimum($a), maximum($a)
  1.744 μs (0 allocations: 0 bytes)
(-3.155119094596879, 3.0526046427540967)

kirklong · January 20, 2022, 9:22pm

I’m only using 1.6.1, and there’s less of a difference but indeed the minimum / maximum way is faster by ~40% which does seem strange.

lmiq · January 20, 2022, 9:36pm

https://github.com/JuliaLang/julia/issues/31442

tbeason · January 20, 2022, 9:36pm

The fact that extrema is slow compared to separate calls of minimum and maximum has had multiple github issues raised I believe.

Topic		Replies	Views
Speeding up `fit(Histogram)` Performance	10	759	November 17, 2023
Performance optimization of a custom binning function Performance	43	1031	December 15, 2022
Use a Histogram as a function General Usage	4	2188	October 21, 2019
Hist() and hits() in Julia? Statistics question	2	3998	December 30, 2016
How to iterate through a StatsBase.Histogram General Usage question	3	697	March 1, 2020

Weighted histogram ~2x as slow in Julia vs. Python

Related topics