Performance optimization of a custom binning function

Dan · December 14, 2022, 12:46am

materializes a vector of Bools of length(x), hence the slowdown.

But, there is still a bit more to optimize, as:

still searches the vector x from the start each time, and not from the last position - this is a O(n^2) time algo and when x grows it will get worse.

On a different matter, looking at the bin counts in the results:

julia> ans2 = bin2(x, y, xbin_edges)
([0.05250221420254687, 0.15445935346995493, 0.2509019085421154, 0.34946848820087534, 0.4489906779469514, 0.5480592397942313, 0.6459265647762795, 0.7482272000306962, 0.8492190264532677, 0.0], [0.479559946078053; 0.5409834483815402; … ; 0.49189476192543485; 0.0;;], [100, 95, 95, 106, 107, 112, 99, 91, 87, 0])

julia> sum([100, 95, 95, 106, 107, 112, 99, 91, 87, 0])
892

892 isn’t a good sum, as it is, eh… less than 1000. This is a correctness bug.

Here is another version, which hopefully addresses this bug (and adds some others probs):

function bin3(
    x::Vector{<:Number},
    y::Array{<:Number},
    xbin_edges::Union{Vector{<:Number},StepRangeLen{}};
    method::F = mean,
) where {F<:Function}
    issorted(xbin_edges; lt = <) || error("xbin_edges not sorted")
    ndims(y) > 2 && error("y must be a vector or matrix")
    size(x, 1) == size(y, 1) || error("x and y must have the same number of rows")
    # sort along x for faster binning
    if !issorted(x)
        p = sortperm(x)
        x = x[p]
        y = y[p]
    end
    # initialize outputs
    n_bin = length(xbin_edges) - 1
    ind = zeros(Int, n_bin + 1)
    i, bi = 1, 1
    for be in xbin_edges
        ff = findfirst(≥(be), @view x[i:end])
        if isnothing(ff)
            ind[bi] = length(x)
            break
        end
        ind[bi] = ff + i - 1
        bi += 1
        i += ff
    end
    last_bin = bi - 1
    x_binned = zeros(eltype(x), n_bin)
    y_binned = zeros(eltype(y), n_bin, size(y, 2))
    bin_count = zeros(Int, size(x_binned))
    # find binned values
    for i = 1:last_bin
        bin_count[i] = ind[i+1] - ind[i]
        x_binned[i] = method(@view x[ind[i]:(ind[i+1]-1)])
        y_binned[i, :] = method((@view y[ind[i]:(ind[i+1]-1), :]), dims = 1)
    end

    return x_binned, y_binned, bin_count
end

Edit: As PeterSimon pointed out, using findnext would be more readable (should have remembered this).

PeterSimon · December 14, 2022, 12:46am

Also, [findfirst(x .>= be for be in xbin_edges for be in xbin_edges] creates a new true/false vector for each entry in xbin_edges. Better to pass a function as the first argument of findfirst and avoid all the allocations.

Edit: Crossed paths with Dan’s post, which pointed out the same thing. Yes, one could use findnext iteratively to avoid the inefficiency of multiple findfirst calls.

bertschi · December 14, 2022, 12:52am

findfirst actually gives an O(n*m) algorithm with n = length(x) and m = length(xbin_edges). As an improvement one might use searchsortedfirst(x, be) instead which should be able to exploit the fact that x is sorted. It does not address the correctness bug, but should presumably run in O(m * log n), i.e., even better than O(n) if m << n.

Dan · December 14, 2022, 12:58am

searchsortedfirst should be a O(log n) binary search, so theoretically faster than O(n), but specific instance and hardware performance issues will determine faster approach.

Interestingly, a searchsortedguess version could be made which assumes some sort of “uniform” distribution of x and tries to binary search with bias towards a guessed location of the next bin cutoff. This might be going on a tangent though.

Edit: Again, cross-posting with bertschi about the log(N)-ess of sorted search.

alex-s-gardner · December 14, 2022, 1:06am

In this case this is intentional as we only want bins to include data within bin limits… even if the bin limits do not cover the full range of data

Dan · December 14, 2022, 1:10am

julia> xbin_edges
0.0:0.1:1.0

julia> extrema(x)
(0.0006307304436002914, 0.9988458402204999)

The bins include all the data in this test case.

alex-s-gardner · December 14, 2022, 1:43am

Good catch!

alex-s-gardner · December 14, 2022, 1:44am

Thanks all for taking the time… I learned a ton! Very much appreciated.

alex-s-gardner · December 14, 2022, 5:16am

Found a couple of more mistakes… this is what I settled on, posting for anyone that comes next:

function bin(
    x::Vector{<:Number},
    y::Array{<:Number},
    xbin_edges::Union{Vector{<:Number},StepRangeLen{}};
    method::F = mean,
) where {F<:Function}
    
    issorted(xbin_edges; lt = <) || error("xbin_edges not sorted")
    ndims(y) > 2 && error("y must be a vector or matrix")
    size(x, 1) == size(y, 1) || error("x and y must have the same number of rows")
    
    # sort along x for faster binning
    if !issorted(x)
        p = sortperm(x)
        x = x[p]
        y = y isa Vector ? y[p] : y[p,:]
    end

    # initialize outputs
    n_bin = length(xbin_edges) - 1
    ind = zeros(Int, n_bin + 1)
    i, bi = 1, 1
    for be in xbin_edges
        ff = searchsortedfirst((@view x[i:end]), be)
        if isnothing(ff)
            ind[bi] = length(x) + 1
            break
        end
        ind[bi] = ff + i - 1
        bi += 1
        i += ff - 1 
    end
    
    last_bin = bi - 2
    x_binned = zeros(eltype(x), n_bin)
    y_binned = zeros(eltype(y), n_bin, size(y, 2))
    bin_count = zeros(Int, size(x_binned))

    # find binned values
    for i = 1:last_bin
        bin_count[i] = ind[i+1] - ind[i]
        if bin_count[i] > 0
            x_binned[i] = method(@view x[ind[i]:(ind[i+1]-1)])
            y_binned[i, :] = method((@view y[ind[i]:(ind[i+1]-1), :]), dims = 1)
        end
    end

    return x_binned, y_binned, bin_count
end

bertschi · December 14, 2022, 10:10am

Just another few remarks on the searchsortedfirst piece:

According to the docs searchsortedfirst already returns lastindex(x) + 1 if all values of x are greater than be, i.e., the if isnothing(ff) block will be unreachable.
Changing the start index i in searchsortedfirst((@view x[i:end]), be) seems unnecessary as it will only cutoff a few steps in the binary search compared to searchsortedfirst(x, be) – the search is O(log n) after all – at the expense of additional complexity of keeping track of the indices in your code and some overhead from @view x[i:end] which also needs to map sub-indices to the parent array on each access.

In any case, the best algorithm surely depends on your use case, i.e., the sizes of x and xbin_edges. I.e., for short vectors x an O(n) search exploiting that both x and xbin_edges are sorted could be faster than the binary search of searchsortedfirst due to memory locality (binary search does random access in any case). Have not benchmarked this though …

Dan · December 14, 2022, 11:25am

Since performance juice has been squeezed from this function. Here is another version that taps some succinctness juice:

using StatsBase, BlockedArrays

x = rand(1000); y = rand(1000); xbin_edges = 0:0.1:1; method = :mean

function bin4(x, y, b, f = mean)
    p = sortperm(mortar([x,b]))
    bins = findall(p.>length(x))
    bin_count = diff(bins).-1
    x_binned = f.(x[p[bins[i]+1:bins[i+1]-1]] for i in 1:length(b)-1)
    y_binned = f.(y[p[bins[i]+1:bins[i+1]-1],:] for i in 1:length(b)-1)
    return x_binned, y_binned, bin_count
end

Gives similar results:

julia> ans4 = bin4(x, y, xbin_edges)
([0.05121915383957467, 0.14802010669692384, 0.25479136196099045, 0.3534202855299341, 0.44822152654780295, 0.5520721732892945, 0.6471681681318727, 0.7462173706421277, 0.8528325708490744, 0.9481594827553909], 
 [0.4935288649339929, 0.4978985328343402, 0.5233712241230352, 0.5127680432385109, 0.5095357581410285, 0.5337653237252364, 0.5257035473226342, 0.508290731656027, 0.5080514760215488, 0.4612049530088362],
 [69, 108, 107, 97, 109, 104, 99, 101, 99, 107])

This version can also be optimized a bit for performance (but it isn’t too bad).

alex-s-gardner · December 14, 2022, 5:10pm

Taking into account @bertschi comments and simplifying gives a highly performant and readable solution:

function bin5(
    x::Vector{<:Number},
    y::Array{<:Number},
    xbin_edges::Union{Vector{<:Number},StepRangeLen{}};
    method::F = mean,
) where {F<:Function}
    
    issorted(xbin_edges; lt = <) || error("xbin_edges not sorted")
    ndims(y) > 2 && error("y must be a vector or matrix")
    size(x, 1) == size(y, 1) || error("x and y must have the same number of rows")
    
    # sort along x for faster binning
    if !issorted(x)
        p = sortperm(x)
        x = x[p]
        y = y isa Vector ? y[p] : y[p,:]
    end

    # initialize outputs
    ind = Int[searchsortedfirst(x, be) for be in xbin_edges]
    n_bin = length(xbin_edges) - 1
    x_binned = zeros(eltype(x), n_bin)
    y_binned = zeros(eltype(y), n_bin, size(y, 2))
    bin_count = zeros(Int, size(x_binned))
 
    # find binned values
    for i = 1:n_bin
        bin_count[i] = ind[i+1] - ind[i]
        if bin_count[i] > 0
            x_binned[i] = method(@view x[ind[i]:(ind[i+1]-1)])
            y_binned[i, :] = method((@view y[ind[i]:(ind[i+1]-1), :]), dims = 1)
        end
    end

    return x_binned, y_binned, bin_count
end

using Statistics, BenchmarkTools
n = 1000;
x = rand(n); y = rand(n); xbin_edges = 0:0.1:1;
@benchmark bin5(x, y, xbin_edges; method=mean)

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  31.718 μs …   9.859 ms  ┊ GC (min … max): 0.00% … 99.45%
 Time  (median):     34.044 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   41.035 μs ± 179.824 μs  ┊ GC (mean ± σ):  8.72% ±  1.99%

   █▆                                                           
  ▅██▆▄▃▃▃▃▃▃▆▆▄▂▂▂▂▂▂▂▃▃▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▁▂▂ ▃
  31.7 μs         Histogram: frequency by time         81.6 μs <

alex-s-gardner · December 14, 2022, 5:12pm

Very compact solution that I’m sure could be optimized but as is runs about 100 times slower

Dan · December 15, 2022, 12:31am

Turns out, I had only to give up on using clever BlockedArrays, and the compact function actually benchmarked faster than bin5:

function bin4(x, y, b, f = mean)
    p = sortperm(vcat(x,b))
    bins = findall(p.>length(x))
    x_binned = [f(x[p[k]] for k in bins[i]+1:bins[i+1]-1)
                  for i in 1:length(b)-1]
    y_binned = [f(y[p[k],t] for k in bins[i]+1:bins[i+1]-1)
                  for i in 1:length(b)-1, t in 1:size(y,2)]
    return x_binned, y_binned, diff(bins).-1
end

(and it doesn’t need sorted bin_edges)

julia> @btime bin4($x, $y, $xbin_edges);
  41.895 μs (14 allocations: 21.52 KiB)

julia> @btime bin5($x, $y, $xbin_edges);
  46.254 μs (118 allocations: 40.41 KiB)

Edit: Sacrificing clarity, the function can be made to fit into the 7-line category of this thread https://discourse.julialang.org/t/seven-lines-of-julia-examples-sought/50416/1 :

function bin6line(x, y, b, f = mean)
    p = sortperm(vcat(x,b))
    bins = findall(p.>length(x))
    aggr = @views [f(z[p[bins[i]+1:bins[i+1]-1]]) for i in 1:length(b)-1, z in (x, eachcol(y)...) ]
    return @view(aggr[:,1]), @view(aggr[:,2:end]), diff(bins).-1
end

alex-s-gardner · December 15, 2022, 5:52am

Bravo… that’s impressive. I get about a 7% improvement in speed with this one… and in so few lines. Though I like more descriptive variable names for readability.

DNF · December 15, 2022, 6:25am

Should be

bins = findall(>(length(x)), p)

to avoid unnecessary temporary array.

Dan · December 15, 2022, 10:41am

@DNF: it has crossed my mind, but the other form is cleaner and benchmarking showed no difference. In fact, on a side note, why is there no difference in the following?

julia> vv  = rand(1:100,10_000);

julia> @btime findall($vv.>99);
  6.475 μs (4 allocations: 6.42 KiB)

julia> @btime findall(>(99), $vv);
  6.531 μs (4 allocations: 6.42 KiB)

(Julia 1.8.3)

The more interesting question is how to speed up the next line aggr = @view… ?

@alex-s-gardner : To get better naming, returning a named-tuple seems nice, especially since elements of tuple have different types. Something like:

(x_binned = @view(aggr[:,1]), y_binned = @view(aggr[:,2:end]), bin_count = diff(bins).-1)

as last bin6 line (other versions can be adapted similarly). Doing so allows destructuring into different vars, or accessing fields namefullly.

TL-CP (too long, just let me copy-paste please):

Code

using Statistics, BenchmarkTools

function bin5(
    x::Vector{<:Number},
    y::Array{<:Number},
    xbin_edges::Union{Vector{<:Number},StepRangeLen{}};
    method::F = mean,
) where {F<:Function}
    
    issorted(xbin_edges; lt = <) || error("xbin_edges not sorted")
    ndims(y) > 2 && error("y must be a vector or matrix")
    size(x, 1) == size(y, 1) || error("x and y must have the same number of rows")
    
    # sort along x for faster binning
    if !issorted(x)
        p = sortperm(x)
        x = x[p]
        y = y isa Vector ? y[p] : y[p,:]
    end

    # initialize outputs
    ind = Int[searchsortedfirst(x, be) for be in xbin_edges]
    n_bin = length(xbin_edges) - 1
    x_binned = zeros(eltype(x), n_bin)
    y_binned = zeros(eltype(y), n_bin, size(y, 2))
    bin_count = zeros(Int, size(x_binned))
 
    # find binned values
    for i = 1:n_bin
        bin_count[i] = ind[i+1] - ind[i]
        if bin_count[i] > 0
            x_binned[i] = method(@view x[ind[i]:(ind[i+1]-1)])
            y_binned[i, :] = method((@view y[ind[i]:(ind[i+1]-1), :]), dims = 1)
        end
    end

    return x_binned, y_binned, bin_count
end

function bin4(x, y, b, f = mean)
    p = sortperm(vcat(x,b))
    bins = findall(p.>length(x))
    x_binned = [f(x[p[k]] for k in bins[i]+1:bins[i+1]-1)
                  for i in 1:length(b)-1]
    y_binned = [f(y[p[k],t] for k in bins[i]+1:bins[i+1]-1)
                  for i in 1:length(b)-1, t in 1:size(y,2)]
    return x_binned, y_binned, diff(bins).-1
end

function bin6(x, y, b, f = mean)
    p = sortperm(vcat(x,b))
    bins = findall(>(length(x)), p)
    aggr = @views [f(z[p[bins[i]+1:bins[i+1]-1]]) for i in 1:length(b)-1, z in (x, eachcol(y)...) ]
    (x_binned = @view(aggr[:,1]), y_binned = @view(aggr[:,2:end]), bin_count = diff(bins).-1)
end

n = 1000;
x = rand(n); y = rand(n); xbin_edges = 0:0.1:1;

display(@benchmark bin5(x, y, xbin_edges; method=mean));
display(@benchmark bin4(x, y, xbin_edges));
display(@benchmark bin6(x, y, xbin_edges));
; # that's all

ADDENDUM: This calculation can also be done with DataFrames (or any DB-like interface) as it is a GroupBy like operation. So:

using DataFrames

dfbin(x, y, b, f = mean) =
  combine(
    groupby(transform(
      DataFrame(x=x,y=collect(eachrow(y)); copycols=false), 
      :x => ByRow(x -> searchsortedfirst(b,x)) => :bin
    ), :bin; sort=true), 
    Not(:bin) .=> (reduced(c) = f(c,dims=1)), :bin => length => :bin_count;
    keepkeys=false
  )

and we have:

julia> dfbin(x,y,xbin_edges)
10×3 DataFrame
 Row │ x_reduced  y_reduced   bin_count 
     │ Float64    Array…      Int64     
─────┼──────────────────────────────────
   1 │ 0.0434871  [0.541275]         92
   2 │ 0.151771   [0.521445]         85
   3 │ 0.250651   [0.5025]          113
   4 │ 0.346103   [0.440693]        107
   5 │ 0.446809   [0.523061]        111
   6 │ 0.555471   [0.491154]         89
   7 │ 0.649694   [0.483566]         91
   8 │ 0.751556   [0.44833]         102
   9 │ 0.850171   [0.555502]        105
  10 │ 0.948096   [0.535546]        105

It is slower (around 10x), but maybe optimization possible, especially if data comes from DataFrame already.

DNF · December 15, 2022, 12:07pm

Yeah, I find that odd as well, particularly that the allocations are the exactly the same every time.

In fact, this is really odd:

julia> vv = rand(1:100, 10_000);

julia> @btime findall(>(101), $vv);
  4.900 μs (4 allocations: 5.61 KiB)

julia> findall(>(101), vv)
Int64[]

Looked in the source:

findall(testf::F, A::AbstractArray) where {F<:Function} = findall(testf.(A))

Huh…

alex-s-gardner · December 15, 2022, 5:23pm

These two lines are so elegant

alex-s-gardner · December 15, 2022, 6:44pm

Dan:

 p = sortperm(vcat(x,b))
    bins = findall(p.>length(x))
    x_binned = [f(x[p[k]] for k in bins[i]+1:bins[i+1]-1)
                  for i in 1:length(b)-1]
    y_binned = [f(y[p[k],t] for k in bins[i]+1:bins[i+1]-1)
                  for i in 1:length(b)-1, t in 1:size(y,2)]

@Dan Note that we lost out check for empty bins, need something like:

# find bin breakpoints
    p = sortperm(vcat(x, xbin_edges))
    bins = findall(>(length(x)), p)
    bin_count = diff(bins).-1
    
    # calculate binned metrics
    x_binned = [(bin_count[i] > 0) ? method(x[p[bins[i]+1:bins[i+1]-1]]) : 0 for i in 1:length(xbin_edges)-1]
    y_binned = [(bin_count[i] > 0) ? method(y[p[bins[i]+1:bins[i+1]-1]]) : 0 for i in 1:length(xbin_edges)-1]

Topic		Replies	Views
Allocation / performance for 2d array sorts Performance	5	604	November 10, 2020
Performance input on my code Performance	8	844	October 15, 2018
Binning a 2D array New to Julia	8	993	September 17, 2020
Optimizing performance (newcomer from Matlab) General Usage first-steps	19	1211	November 8, 2019
Julia Binary Heap Benchmark Performance benchmark , java	12	1332	December 17, 2020

Performance optimization of a custom binning function

Related topics