I’m looking for feedback on how I might be able to further optimize the below bin function. There are a couple of coding patterns that I’m curious to get feedback on, specifically my approach to internal array initialization and method switching. These are coding patterns that I use often but I have a suspicion that they are not performant in Julia.

using Statistics
x = rand(100); y = rand(100); xbin_edges = 0:0.1:1; method = :mean
bin(x, y, xbin_edges; method)
"""
bin(x::Vector{<:Number}, y::Array{<:Number}, xbin_edges::Vector{<:Number}; method::Symbol = :mean)
Fast binning of `y` as a funciton `x`. returns `x_binned`, `y_binned`, `bin_count`. `method` specifies approached used for aggregating binned values.
"""
function bin(x::Vector{<:Number}, y::Array{<:Number}, xbin_edges::Union{Vector{<:Number}, StepRangeLen{}}; method::Symbol = :mean)
# make sure that xbin_edges are from smallest to largests
if !issorted(xbin_edges; lt = <)
error("xbin_edges not sorted")
end
if ndims(y) > 2
error("y must be a vector or 2D array")
end
if size(x,1) != size(y,1)
error("x and y must have the same number of rows")
end
# sort along x for faster binning
if !issorted(x)
p = sortperm(x)
x = x[p]
y = y[p]
end
# initialize outputs
ind = [findfirst(x .>= be) for be in xbin_edges]
n_bin = length(xbin_edges)-1;
x_binned = zeros(typeof(x[1]), n_bin)
y_binned = zeros(typeof(y[1]), n_bin, size(y,2))
bin_count = zeros(size(x_binned))
# find binned values
for i = 1:n_bin
if !isnothing(ind[i]) && !isnothing(ind[i+1]) && (ind[i] < ind[i+1])
bin_count[i] = ind[i+1] - ind[i];
# add new methods here
if method == :mean
x_binned[i] = mean(x[ind[i]:(ind[i+1]-1),:])
y_binned[i] = mean(y[ind[i]:(ind[i+1]-1),:])
elseif method == :median
x_binned[i] = median(x[ind[i]:(ind[i+1]-1),:])
y_binned[i] = median(y[ind[i]:(ind[i+1]-1),:])
end
end
end
return x_binned, y_binned, bin_count
end

Thanks for the suggestion but it seems adding @view(x[...]) or @views function has no impact on performance. I think this is because indexing into x is handled as a view by default, similar to Python.

n = 1000;
x = rand(n); y = rand(n); xbin_edges = 0:0.1:1; method = :mean
@benchmark bin(x, y, xbin_edges; method)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 53.861 μs … 10.949 ms ┊ GC (min … max): 0.00% … 99.19%
Time (median): 59.354 μs ┊ GC (median): 0.00%
Time (mean ± σ): 75.399 μs ± 290.890 μs ┊ GC (mean ± σ): 13.72% ± 3.56%
▄▆█▇▆▅▄▅▅▅▄▂▁▁▂▁▁▁▁▁ ▂
█████████████████████▇▇▆█▇▆▇▇▇▇▇▇▆▆▅▅▄▆▅▄▅▅▄▆▅▅▆▆▅▅▅▂▄▅▅▄▃▂▅ █
53.9 μs Histogram: log(frequency) by time 148 μs <
Memory estimate: 80.45 KiB, allocs estimate: 291.
@benchmark bin_views(x, y, xbin_edges; method)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 55.590 μs … 10.326 ms ┊ GC (min … max): 0.00% … 98.77%
Time (median): 62.511 μs ┊ GC (median): 0.00%
Time (mean ± σ): 79.364 μs ± 311.441 μs ┊ GC (mean ± σ): 14.89% ± 3.82%
▂▅▇█▇▆▄▃▄▃▄▅▄▁▁▁▁▁▁ ▂
██████████████████████▇▇▆▆▆▆▇▆▇▇▆▇▇▆▆▆▆▆▅▆▅▆▅▇▆▆▆▅▅▅▅▆▄▄▄▄▄▅ █
55.6 μs Histogram: log(frequency) by time 148 μs <
Memory estimate: 93.39 KiB, allocs estimate: 255.

function bin_views(x::Vector{<:Number}, y::Array{<:Number}, xbin_edges::Union{Vector{<:Number}, StepRangeLen{}}; method::Symbol = :mean)
# make sure that xbin_edges are from smallest to largests
if !issorted(xbin_edges; lt = <)
error("xbin_edges not sorted")
end
if ndims(y) > 2
error("y must be a vector or 2D array")
end
if size(x,1) != size(y,1)
error("x and y must have the same number of rows")
end
# sort along x for faster binning
if !issorted(x)
p = sortperm(x)
x = x[p]
y = y[p]
end
# initialize outputs
ind = [findfirst(x .>= be) for be in xbin_edges]
n_bin = length(xbin_edges)-1;
x_binned = zeros(typeof(x[1]), n_bin)
y_binned = zeros(typeof(y[1]), n_bin, size(y,2))
bin_count = zeros(size(x_binned))
# find binned values
for i = 1:n_bin
if !isnothing(ind[i]) && !isnothing(ind[i+1]) && (ind[i] < ind[i+1])
bin_count[i] = ind[i+1] - ind[i];
# add new methods here
if method == :mean
x_binned[i] = mean(@views(x[ind[i]:(ind[i+1]-1),:]))
y_binned[i] = mean(@views(y[ind[i]:(ind[i+1]-1),:]))
elseif method == :median
x_binned[i] = median(@views(x[ind[i]:(ind[i+1]-1),:]))
y_binned[i] = median(@views(y[ind[i]:(ind[i+1]-1),:]))
end
end
end
return x_binned, y_binned, bin_count
end

I concur with @lmiq. In addition, there are a couple of things that pop out at me. One of your assignment statements is

x_binned[i] = mean(x[ind[i]:(ind[i+1]-1),:])

but x is a vector, so the second index : is unneeded. Also, you write

y_binned[i] = median(y[ind[i]:(ind[i+1]-1),:])

where you index y_binned with a single integer index, but y_binned may be a matrix with more than a single column. This indexing is legal, but if there is more than one column, your code does not assign values to the second and subsequent columns. They are left at zero due to construction using zeros. It’s hard to imagine that is intended behavior.

@views with the s is to use views in the complete function.

(I wonder if there was a way to throw some warning in such situations, but probably that’s hard, as that could make sense depending on what follows the macro).

Edit: now I tested it here it you can use both @views or @view there, its the same.

There is a small reduction in the amount of memory allocated, but the speedup is not very important in this case:

That benchmark is consistent with what I get, but using @views or @view does not make any difference for me.

Be careful in interpolating the variables in the benchmark (add $ in @benchmark bin($x, $y, $xbin_edges), for example. In this case it is not making much of a difference, but in some cases it does, and it is the correct thing to do (because of the internals of the benchmarking macros).

@view applies to a single a[b] statement while @views applies the substitution to an entire code block or statement. This block can be function, for, if, let, begin, etc, or a single statement. So @views was totally legal where used, it just happens to operate identically to @view in that situation.

I would limit uses of things like @views or @inbounds to only single lines or blocks of maybe <10 lines at a time. Otherwise you risk accidentally applying it in places where you’ve forgotten you were using it. Also, someone reading your code may not notice it if it’s far away from the effect site.

I also don’t see any significant speed improvement from eliding the issorted test.

Here is my improved version:

function bin2(x::Vector{<:Number}, y::VecOrMat{<:Number}, xbin_edges::Union{Vector{<:Number}, StepRangeLen{}}; method::F=mean) where F <: Function
issorted(xbin_edges; lt = <) || error("xbin_edges not sorted")
size(x,1) == size(y,1) || error("x and y must have the same number of rows")
# sort along x for faster binning
if !issorted(x)
p = sortperm(x)
x = x[p]
y = y isa Vector ? y[p] : y[p,:]
end
# initialize outputs
ind = Int[x[end] ≥ be ? findfirst(≥(be), x) : 0 for be in xbin_edges]
n_bin = length(xbin_edges) - 1
x_binned = zeros(eltype(x), n_bin)
y_binned = zeros(eltype(y), n_bin, size(y,2))
bin_count = zeros(Int, size(x_binned))
# find binned values
for i = 1:n_bin
if !iszero(ind[i] * ind[i+1]) && (ind[i] < ind[i+1])
bin_count[i] = ind[i+1] - ind[i]
x_binned[i] = method(@view x[ind[i]:(ind[i+1]-1)])
y_binned[i,:] = method((@view y[ind[i]:(ind[i+1]-1), :]), dims=1)
end
end
return x_binned, y_binned, bin_count
end
x = rand(100); y = rand(100); xbin_edges = 0:0.1:1; method = :mean
ans1 = bin(x, y, xbin_edges)
ans2 = bin2(x, y, xbin_edges)
using LinearAlgebra: norm
@show maximum((norm(ans1[i] - ans2[i]) for i in 1:3)) # 0.0
using BenchmarkTools
@btime bin($x, $y, $xbin_edges); # 17.600 μs (215 allocations: 15.11 KiB)
@btime bin2($x, $y, $xbin_edges); # 5.383 μs (125 allocations: 11.22 KiB)
@btime bin($x, $y, $xbin_edges, method=:median); # 45.700 μs (539 allocations: 28.80 KiB)
@btime bin2($x, $y, $xbin_edges, method=median); # 29.700 μs (440 allocations: 24.77 KiB)

Note that I’m comparing to the corrected bin function as per the OP’s corrections.

Edit: Corrected type instability reported by @mikmoore , and eliminated a test by declaring y to be VecOrMat

As written, the line y = y[p] drops every column of y after the first. You probably didn’t mean to do that.

It also changes the type of y from Array{T,N} to Array{T,1}. Since this change happens in a conditional block, this will cause a type instability. Since it’s a union of concrete types this instability probably isn’t too detrimental to performance, but this still probably wasn’t what was intended.

To fix the stability, you’ll need to ensure that y retains its type throughout this transformation. It isn’t pretty, but something like y = y[ntuple(i -> i==1 ? p : (:), ndims(y))...] would probably accomplish this. I’m sure there’s a better way (maybe using permute), but haven’t bothered to work it out. You could also make several if statements based on ndims(y).

Yes, your original vector ind had to be of type Any because it could contain both Int and nothing. My change allows ind to be a Vector{Int} which can be accessed much more efficiently.

Re your earlier question, see the performance tip for specialization on function arguments.