Performance issue with Vectors vs. 1 Column Matrices

gideonsimpson · October 13, 2024, 2:51am

Buried inside a for loop of mine, I have a function which, morally, looks like:

function f!(a,b)
  @. a = b;
  a
end

It gets used inside the for loop like:

# construct vector x of length nx
# construct vector y of length ny > nx
for i in 1:iters
# code
rows = sample(1:ny, nx,replace = false); # grab random rows of y;
f!(x, y[rows]);
end

The above code works without issue. However, there are some cases where, instead of x and y being vectors, they are matrices with the same number of columns. The equivalent code is then:

# construct matrix x of size (nx,d)
# construct vector y of size (ny,d)
for i in 1:iters
# code
rows = sample(1:ny, nx,replace = false); # grab random rows of y;
f!(x, y[rows,:]);
end

and notice that I have had to use y[rows,;]. Now, I could use this in the vector case, but I take a performance hit:

Random.seed!(100);
x = zeros(100);
y = randn(1000);
rows = sample(1:1000, 100, replace=false);
@btime f!(x,y[rows]);

  124.491 ns (2 allocations: 928 bytes)

@btime f!(x,y[rows,:]);

  320.127 ns (2 allocations: 944 bytes)

Also, if I create x and y as zeros(100,1) and randn(1000,1), I also take the performance hit.

The case where I am only working with vectors is common enough for me that I don’t want to take this performance hit. To avoid the hit now, I have two versions of my outer function (which has the aforementioned for loop), and one is for the vector case and one is for the matrix case via multiple dispatch. But it seems really silly to have that when the only distinction is y[rows] and y[rows,:]. Is there a cleaner way to reconcile this?

gdalle · October 13, 2024, 6:02am

I think multiple dispatch is your answer here. There might be variants of selectdim which would work for a list of indices but I’m not aware of them.

About performance itself, two little tips:

don’t forget to interpolate global variables when you use BenchmarkTools.jl
try array views to avoid allocations (not always faster, see below)

const AV = AbstractVector
const AM = AbstractMatrix

partialcopy!(x::AV, y::AV, rows) = x .= y[rows]
partialcopy!(x::AM, y::AM, rows) = x .= y[rows, :]

partialcopy2!(x::AV, y::AV, rows) = x .= @view y[rows]
partialcopy2!(x::AM, y::AM, rows) = x .= @view y[rows, :]

Setup code:

using BenchmarkTools, StatsBase
x, X = zeros(100), zeros(100, 1)
y, Y = ones(1000), ones(1000, 1)
rows = sample(1:1000, 100, replace=false)

Results:

julia> @btime partialcopy!($x, $y, $rows);
  211.100 ns (1 allocation: 896 bytes)

julia> @btime partialcopy2!($x, $y, $rows);
  113.900 ns (0 allocations: 0 bytes)

julia> @btime partialcopy!($X, $Y, $rows);
  248.100 ns (1 allocation: 896 bytes)

julia> @btime partialcopy2!($X, $Y, $rows);
  332.664 ns (0 allocations: 0 bytes)

Interestingly, the array views speed things up for vectors and slow them down for matrices. I think you’d actually be better served by writing out the loop explicitly here.

gideonsimpson · October 13, 2024, 6:43pm

Inspired by this, I came up the following variation which also seems to accomplish the task:

function f!(a, b)
    @. a = b;
    a
end
function extract_rows(a::TA, rows) where {TA<:AbstractVector}
    return @view a[rows]
end

function extract_rows(a::TA, rows) where {TA<:AbstractMatrix}
    return @view a[rows,:]
end

Random.seed!(100);
nx = 100;
ny = 1000;
x = zeros(nx);
y = randn(ny);
X = x[:, :];
Y = y[:, :];
rows = sample(1:ny, nx, replace=false);
@btime f!($(x), extract_rows($(y), $(rows)));
@btime f!($(X), extract_rows($(Y), $(rows)));

  81.176 ns (0 allocations: 0 bytes)
  123.551 ns (0 allocations: 0 bytes)

Thoughts?

gdalle · October 13, 2024, 6:54pm

The reason you’re getting better times is that you’re removing the time it takes to compute extract_rows from the benchmark (I think). If I understand correctly, BenchmarkTools only benchmarks the outer function, so to speak.

gideonsimpson · October 13, 2024, 6:58pm

Is your assessment on timing the same for this?

function g!(a,b,rows)
    f!(a,extract_rows(b, rows))
    a
end

@btime g!($(x), $(y), $(rows));
@btime g!($(X), $(Y), $(rows));
  82.361 ns (0 allocations: 0 bytes)
  124.780 ns (0 allocations: 0 bytes)

gdalle · October 13, 2024, 7:07pm

I am pretty sure this one is the correct way to benchmark, so I’m very unsettled that it gives the exact same results ^^

gideonsimpson · October 13, 2024, 7:12pm

I’m just happy that based on your comments I’ve got a viable solution to my code:)

DNF · October 13, 2024, 7:19pm

Is the returned x a typo?

gideonsimpson · October 13, 2024, 7:36pm

Yes. Edited

Topic		Replies	Views
A problem about performance Performance	30	711	October 3, 2022
The columns and the rows - puzzled about performance Performance question	8	860	October 14, 2021
1d arrays vs 1-column matrices New to Julia question	7	2685	September 5, 2020
Performance difference when accessing square matrix rows-first or cols-first Performance	14	1646	April 13, 2021
Most performant way to perform matrix and vector calculations in a function Performance question , performance , speed-optimization , matrix	5	480	October 9, 2023

Performance issue with Vectors vs. 1 Column Matrices

Related topics