I made some experiments to benchmark @rdeits solution. The results depend a bit on if you can prepare your x
data with the good shape (already transposed or already a Vecto
r of StaticVector
s.
I was a bit surprised that the vector comparison is not automatically vectorized.
Here are the timings I obtain with different functions (with differently prepared data):
For 1000000 rows of size 4
3.749 ms (500002 allocations: 22.89 MiB) #find_vector (no transpose)
1.987 ms (3 allocations: 64 bytes) ) #find_vector_sa (transpose and reinterpret last @rdeits solution)
1.419 ms (2 allocations: 48 bytes) #find_vector_sa_tr (already transpose and S.A. reinterpret)
1.292 ms (1 allocation: 32 bytes) #find_vector_sa_tr2 (explicit SA conversion of target)
1.276 ms (0 allocations: 0 bytes) #find_vector_sa_trs (pre-oonstruct of Vector of SA)
867.237 μs (0 allocations: 0 bytes)#find_vector_sa_trs2 (S.A. + simd loop)
I guess that the allocations can make a difference if the number of rows is small:
For 30 rows of sizes 4:
154.493 ns (17 allocations: 864 bytes)
86.302 ns (3 allocations: 64 bytes)
60.622 ns (2 allocations: 48 bytes)
52.229 ns (1 allocation: 32 bytes)
42.577 ns (0 allocations: 0 bytes)
21.645 ns (0 allocations: 0 bytes)
The MWE
Summary
using StaticArrays
using BenchmarkTools
using LinearAlgebra
const L=4
function allbench_findrow(N)
x= rand(N,L)
xt=collect(transpose(x))
xts=collect(reinterpret(SVector{L, Float64},xt))
target = x[div(N,2),:]
@show target
@show find_vector(x,target,N)
@show find_vector_sa(x,target,N)
@show find_vector_sa_tr(xt,target,N)
@show find_vector_sa_tr2(xt,target,N)
@show find_vector_sa_trs(xts,target,N)
@show find_vector_sa_trs2(xts,target,N)
@assert target==find_vector(x,target,N)
@assert target==find_vector_sa(x,target,N)
@assert target==find_vector_sa_tr(xt,target,N)
@assert target==find_vector_sa_tr2(xt,target,N)
@assert target==find_vector_sa_trs(xts,target,N)
@assert target==find_vector_sa_trs2(xts,target,N)
@btime find_vector($x,$target,$N)
@btime find_vector_sa($x,$target,$N)
@btime find_vector_sa_tr($xt,$target,$N)
@btime find_vector_sa_tr2($xt,$target,$N)
@btime find_vector_sa_trs($xts,$target,$N)
@btime find_vector_sa_trs2($xts,$target,$N)
end
function find_vector(x,target,N)
I=findfirst(axes(x, 1)) do i
@view(x[i, :]) == target
end
return x[I,:]
end
function find_vector_sa(xt,target,N)
x=transpose(xt)
columns = reinterpret(SVector{L, Float64}, x)
I = findfirst(isequal(target), columns)
columns[I]
end
function find_vector_sa_tr(x,target,N)
@inbounds columns = reinterpret(SVector{L, Float64}, x)
I = findfirst(isequal(target), columns)
columns[I]
end
function find_vector_sa_tr2(x,target,N)
@inbounds columns = reinterpret(SVector{L, Float64}, x)
starget=SVector{L, Float64}(target)
I = findfirst(isequal(starget), columns)
columns[I]
end
function find_vector_sa_trs(x,target,N)
columns = x
starget=SVector{L, Float64}(target)
I = findfirst(isequal(starget), columns)
columns[I]
end
function find_vector_sa_trs2(x,target,N)
starget=SVector{L, Float64}(target)
@inbounds @simd for i in 1:N
n=x[1,i]-starget
dot(n,n)==0.0 && return x[1,i]
end
nothing
end
allbench_findrow(1000000)