Fastest Way to Carry Out Multiple Cross Products

Hi everyone, I am moving some code from Matlab to Julia and I need to carry out a large number (N) of cross products (at least ten thousand). In Matlab, I had two 3xN matrices and I could use cross(omega,pmat) to take the cross product of all vectors. In Julia I understand it is not possible to do this and my searches have led me to

omegapcross= [cross(omega[:,i],pmat[:,i]) for i=1:size(pmat)[2]]

However, this is quite slow compared to the same operation in Matlab. What is the fastest way I can do it?

The first reason that’s slow is that omega[:,i] and pmat[:,i] allocate an intermediate arrays. You can avoid that by adding @views in from of the operation:

julia> x = rand(3,10_000); y = rand(3, 10_000);

julia> f1(x,y) = [ cross(x[:,i],y[:,i]) for i in axes(x,2) ]
f1 (generic function with 1 method)

julia> @btime f1($x,$y);
  484.221 μs (30002 allocations: 2.37 MiB)

julia> f2(x,y) = @views [ cross(x[:,i],y[:,i]) for i in axes(x,2) ]
f2 (generic function with 1 method)

julia> @btime f2($x,$y);
  173.766 μs (10002 allocations: 859.42 KiB)

The second reason is that the cross operation allocates a new array for each result. If you are working with coordinates (as the 3,N dimensions suggest), a better alternative is to use static vectors:

julia> using StaticArrays

julia> x = rand(SVector{3,Float64}, 10_000); y = rand(SVector{3,Float64}, 10_000);

julia> f1(x,y) = [ cross(x[i],y[i]) for i in eachindex(x, y) ]
f1 (generic function with 1 method)

julia> @btime f1($x,$y);
  12.347 μs (2 allocations: 234.42 KiB)

edit: You can also use static vectors as intermediates, and keep the matrix format:

julia> function f2(x,y)
           z = similar(x)
           for i in axes(x,2)
               vx = SVector{3}(@view(x[:,i]))
               vy = SVector{3}(@view(y[:,i]))
               vz = cross(vx,vy)
               z[:,i] .= vz
           end
           return z
       end
f2 (generic function with 1 method)

julia> @btime f2($x,$y);
  10.123 μs (2 allocations: 234.42 KiB)

edit2: You can also get some decent speedup using Polyester.@batch to parallelize that:

julia> import Polyester

# pure SVector version:
julia> function f3(x,y)
           z = similar(x)
           @batch for i in eachindex(x,y)
               z[i] = cross(x[i],y[i])
           end
           return z
       end
f3 (generic function with 1 method)

julia> @btime f3($x,$y);
  2.926 μs (2 allocations: 234.42 KiB)

# matrix result version
julia> function f2(x,y)
           z = similar(x)
           Polyester.@batch for i in axes(x,2)
               vx = SVector{3}(@view(x[:,i]))
               vy = SVector{3}(@view(y[:,i]))
               vz = cross(vx,vy)
               z[:,i] .= vz
           end
           return z
       end
f2 (generic function with 1 method)

julia> @btime f2($x,$y);
  3.142 μs (2 allocations: 234.42 KiB)
8 Likes

Do you by any chance run this at toplevel in a script? Then you will have a repeated access to globals which is very slow in Julia. I’d generally recommend checking out the Performance Tips for more such footguns. There is also a section of the manual on Noteworthy differences to Matlab which might be helpful for you.

You are copying a lot of data here with the slices omega[:,i]. Better would be to use views (look at eachcol for example). To fix both the repeatd access to globals and the copying, you can exploit broadcasting by doing:

omegapcross = cross.(eachcol(omega), eachcol(pmat))

Be aware, that the result will be a vector of vectors not a matrix. I don’t know whether there is a better way that works faster out-of-box, but you could certainly write out the loop and the operation and use something like LoopVectorization.jl to achieve great speedups. Or use StaticArrays like @lmiq

4 Likes

Thanks for all the suggestions. Using static vectors makes the most sense for my application but I cannot add a scalar to these vectors. Take

aa= rand(SVector{3,Float64},2)

Then I try

2.0.+aa

This is probably something very simple but I couldn’t find the answer with a web search.

I am running this inside a function. I had tried eachcol() but there wasn’t too much of an improvement.

The difference is that now instead of being a 2d matrix, it’s a vector of (static) vectors, so you need to do something like

map(a -> 2.0 .+ a, aa)

You can also check out HybridArrays.jl for an example of a package that can make a 2D array where one dimension is statically sized if you dont dont to work with vectors of (static) vectors

2 Likes

Or thinking it more precisely, you probably want to add a vector with equal coordinates to each of the vectors of a:

julia> a = rand(SVector{3,Float64}, 2)
2-element Vector{SVector{3, Float64}}:
 [0.08430971266969611, 0.8600992178391528, 0.556369252768178]
 [0.8610366199372764, 0.5569556755971814, 0.2721240449230232]

julia> Ref(SVector{3}(1.0,2.0,3.0)) .+ a
2-element Vector{SVector{3, Float64}}:
 [1.0843097126696961, 2.860099217839153, 3.556369252768178]
 [1.8610366199372765, 2.556955675597181, 3.2721240449230233]

The reason you need the Ref() there is to avoid broacasting over the components of the first vector (which will result in an error, since the dimension of the second vector is different).

And you can add a scalar with broadcasting to each element of a, in a loop:

julia> a = rand(SVector{3,Float64}, 2)
2-element Vector{SVector{3, Float64}}:
 [0.8969818867149998, 0.18369648340096167, 0.34443677337581036]
 [0.5491626841697853, 0.038240828938052474, 0.0062081732030487835]

julia> for i in eachindex(a)
           a[i]  = a[i] .+ 2.0
       end

julia> a
2-element Vector{SVector{3, Float64}}:
 [2.896981886715, 2.183696483400962, 2.3444367733758105]
 [2.549162684169785, 2.0382408289380525, 2.006208173203049]
1 Like

Thank you all for the alternatives!

This seems like a job for map!

julia> using LinearAlgebra

julia> a = rand(3, 20)
3×20 Matrix{Float64}:
 0.462611   0.849549  0.134681   0.239544  0.272062  0.525344  0.282463  0.28872   0.964929  …  0.622899  0.729477  0.17297    0.658666  0.915017  0.0590547  0.104249  0.937264
 0.0208038  0.366226  0.0084185  0.394584  0.766879  0.294232  0.16829   0.238005  0.681012     0.318527  0.693917  0.0545056  0.275323  0.664675  0.982157   0.312104  0.932436
 0.955077   0.269761  0.391113   0.98473   0.275616  0.61177   0.51022   0.74244   0.465792     0.522879  0.449996  0.809474   0.945416  0.44689   0.845018   0.695358  0.699771

julia> b = rand(3, 20)
3×20 Matrix{Float64}:
 0.00243973  0.229969  0.970546  0.558336  0.403178  0.21809      0.855038  0.325757  0.805477  …  0.280584   0.333554  0.941193  0.966782  0.975886   0.235745  0.698962  0.439919
 0.591027    0.308577  0.282836  0.165541  0.777384  0.00779927   0.815078  0.389583  0.900161     0.382024   0.664833  0.161192  0.835962  0.284416   0.521869  0.19169   0.145034
 0.849143    0.574172  0.968826  0.873323  0.577932  0.000268978  0.579341  0.279071  0.271251     0.0224422  0.620421  0.14151   0.750735  0.0119086  0.768155  0.906703  0.329199

julia> c = zeros(3, 20)
3×20 Matrix{Float64}:
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0

julia> map!(cross, eachcol(c), eachcol(a), eachcol(b))
20-element ColumnSlices{Matrix{Float64}, Tuple{Base.OneTo{Int64}}, SubArray{Float64, 1, Matrix{Float64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}}:
 [-0.5468106907933866, -0.3904928283306145, 0.2733646418937745]
 [0.12703461876689967, -0.4257501381729314, 0.17793088170116877]
 [-0.10246492394240223, 0.2491106409976227, 0.02992224870189574]
 [0.18158607661793907, 0.3406110890154618, -0.18065614416157888]
 [0.22894455070898848, -0.04611112018341028, -0.09769199737687165]
 [-0.004692217551688816, 0.13327963858059308, -0.0600717837979185]
 [-0.3183716819343667, 0.27261490833224306, 0.086334860419883]
 [-0.22282146214369247, 0.16128144811003398, 0.0349485218217349]
 [-0.23456312565647414, 0.11344696034898732, 0.32005245492534096]
 [-0.12072988234185508, 0.2604202754144202, -0.2114983781455344]
 [0.8912161313006809, 0.480220100015665, -0.9161119375703564]
 [0.10852389674645324, -0.09570452165141362, -0.04905015483945689]
 [-0.19260373276727746, 0.13273208855682825, 0.14858870734630042]
 [0.13134796622200579, -0.3024845129431062, 0.2535215794747741]
 [-0.12276791827036129, 0.7373941643644051, -0.023418835382492603]
 [-0.5836373771645582, 0.4195269604916654, 0.28444309430776904]
 [-0.11918715428655909, 0.42521673617797623, -0.38840153852984904]
 [0.3134600238337368, 0.15384560686200935, -0.20071973332650211]
 [0.14969301639432162, 0.3915058397836165, -0.19816571325844537]
 [0.20546669200074208, -0.0007044733104026313, -0.2742608543616962]

julia> c
3×20 Matrix{Float64}:
 -0.546811   0.127035  -0.102465    0.181586   0.228945   -0.00469222  -0.318372   -0.222821   …   0.131348  -0.122768   -0.583637  -0.119187   0.31346    0.149693   0.205467
 -0.390493  -0.42575    0.249111    0.340611  -0.0461111   0.13328      0.272615    0.161281      -0.302485   0.737394    0.419527   0.425217   0.153846   0.391506  -0.000704473
  0.273365   0.177931   0.0299222  -0.180656  -0.097692   -0.0600718    0.0863349   0.0349485      0.253522  -0.0234188   0.284443  -0.388402  -0.20072   -0.198166  -0.274261

3 Likes

Unfortunately it doesn’t avoid the allocation of the intermediate vectors:

julia> function f4(x,y)
           z = similar(x)
           map!(cross, eachcol(z), eachcol(x), eachcol(y))
           return z
       end
f4 (generic function with 1 method)

julia> @btime f4($x,$y)
  202.182 μs (10002 allocations: 1015.67 KiB)

or

cross.(x, y)
2 Likes

These allocations are very annoying.

One method to get rid of them:

julia> using StaticArrays

# copied from original implementation

julia> function cross3(a::AbstractVector, b::AbstractVector)
           if !(length(a) == length(b) == 3)
               throw(DimensionMismatch("cross product is only defined for vectors of length 3"))
           end
           a1, a2, a3 = a
           b1, b2, b3 = b
           @SVector [a2*b3-a3*b2, a3*b1-a1*b3, a1*b2-a2*b1]
       end
cross3 (generic function with 1 method)

julia> function f4(x,y)
           z = similar(x)
           map!(cross3, eachcol(z), eachcol(x), eachcol(y))
           return z
       end
f4 (generic function with 1 method)

julia> @btime f4($x,$y);
  63.956 μs (2 allocations: 234.42 KiB)

It looks natural to return fixed size SVector as the output has a fixed size. And it works quite well inference-wise. But I wish there was a better way.

1 Like

If you are using StaticArrays, you can do:

julia> x = rand(3,10_000); y = rand(3,10_000);

julia> cross3(x,y) = cross(SVector{3,eltype(x)}(x), SVector{3,eltype(y)}(y))
cross3 (generic function with 1 method)

julia> @btime f4($x,$y)
  26.972 μs (2 allocations: 234.42 KiB)
3×10000 Matrix{Float64}:
 -0.0435722   0.141681   0.0206681    -0.0756114   0.52785   …  -0.1911    -0.106267   0.682027   0.495938  -0.249367
 -0.404147   -0.375406   0.000736051   0.0430684   0.314649     -0.112023  -0.219613  -0.926107  -0.59346    0.166233
  0.515854    0.605492  -0.0948584     0.0715813  -0.392803      0.127669   0.155209   0.284226   0.811831  -0.0517297
1 Like