The first reason that’s slow is that omega[:,i]
and pmat[:,i]
allocate an intermediate arrays. You can avoid that by adding @views
in from of the operation:
julia> x = rand(3,10_000); y = rand(3, 10_000);
julia> f1(x,y) = [ cross(x[:,i],y[:,i]) for i in axes(x,2) ]
f1 (generic function with 1 method)
julia> @btime f1($x,$y);
484.221 μs (30002 allocations: 2.37 MiB)
julia> f2(x,y) = @views [ cross(x[:,i],y[:,i]) for i in axes(x,2) ]
f2 (generic function with 1 method)
julia> @btime f2($x,$y);
173.766 μs (10002 allocations: 859.42 KiB)
The second reason is that the cross
operation allocates a new array for each result. If you are working with coordinates (as the 3,N
dimensions suggest), a better alternative is to use static vectors:
julia> using StaticArrays
julia> x = rand(SVector{3,Float64}, 10_000); y = rand(SVector{3,Float64}, 10_000);
julia> f1(x,y) = [ cross(x[i],y[i]) for i in eachindex(x, y) ]
f1 (generic function with 1 method)
julia> @btime f1($x,$y);
12.347 μs (2 allocations: 234.42 KiB)
edit: You can also use static vectors as intermediates, and keep the matrix format:
julia> function f2(x,y)
z = similar(x)
for i in axes(x,2)
vx = SVector{3}(@view(x[:,i]))
vy = SVector{3}(@view(y[:,i]))
vz = cross(vx,vy)
z[:,i] .= vz
end
return z
end
f2 (generic function with 1 method)
julia> @btime f2($x,$y);
10.123 μs (2 allocations: 234.42 KiB)
edit2: You can also get some decent speedup using Polyester.@batch
to parallelize that:
julia> import Polyester
# pure SVector version:
julia> function f3(x,y)
z = similar(x)
@batch for i in eachindex(x,y)
z[i] = cross(x[i],y[i])
end
return z
end
f3 (generic function with 1 method)
julia> @btime f3($x,$y);
2.926 μs (2 allocations: 234.42 KiB)
# matrix result version
julia> function f2(x,y)
z = similar(x)
Polyester.@batch for i in axes(x,2)
vx = SVector{3}(@view(x[:,i]))
vy = SVector{3}(@view(y[:,i]))
vz = cross(vx,vy)
z[:,i] .= vz
end
return z
end
f2 (generic function with 1 method)
julia> @btime f2($x,$y);
3.142 μs (2 allocations: 234.42 KiB)