3D point rotation

I have a 2D array x of size (r, c), where c = 3*nm, so stacked 3D points. I’m able to rotate it around axis based on some θ value, but this code is not flying very fast. I tried using Quaternions for this but I didn’t find an efficient way either. Any ideas? Thanks!

θ = (θ*π)/180
r,c = size(x)
nm = ndims
x3 = reshape(x,r,3,nm)
point = nanmean.([x3[:,k,:] for k in 1:3])
R = AngleAxis(θ, axis...)
r3 = similar(x3)
point_reshaped = reshape(point, 1, 3)
for j in 1:nm
r3[:,:,j] = (R * (x3[:,:,j] .- point_reshaped)')' .+ point'
end
x2 = reshape(r3,r,c)

Hi, welcome! I’m not familiar with AngleAxis and related and I think it would be beneficial to make your example into a MWE with imports etc so others can chime in with more detailed tips, but I wanted to point you to the performance tips: Performance Tips · The Julia Language.

In particular here: use views instead of slice indexing, and use .= when writing to r3

1 Like

Did you try Rotations.jl with StaticArrays.jl? It is usually better to store each vector as a static vector, and collect all vectors into a dynamic vector.

By the way, I also see a double transpose in there: you may wish to transpose R, point and point_reshaped instead together with some linear algebra to avoid the transposing altogether.

Thanks. Here’s a MWE:

using Rotations,NaNStatistics,BenchmarkTools
function rotate(x,θ,axis)
θ = (θ*π)/180
r,c = size(x)
nm = div(c,3)
x3 = reshape(x,r,3,nm)
point = nanmean.([x3[:,k,:] for k in 1:3])
R = AngleAxis(θ, axis...)
r3 = similar(x3)
point_reshaped = reshape(point, 1, 3)
for j in 1:nm
r3[:,:,j] = (R * (x3[:,:,j] .- point_reshaped)')' .+ point'
end
x2 = reshape(r3,r,c)
end
@btime rotate(randn(20700,63),90,[1 0 0])

68.665 ms (279 allocations: 71.94 MiB)
I tried using StaticArrays.jl, Quaternions.jl (Rotate a vector with a quaternion), inbounds, @views, but the code above was the fastest. Moar performance is needed!

OK, this is a bit faster already:

using Rotations,NaNStatistics,BenchmarkTools,LinearAlgebra
function rotate(x,θ,axis)
    θ = (θ*π)/180
    r,c = size(x)
    nm = div(c,3)
    x3 = reshape(x,r,3,nm)
    point = nanmean.([x3[:,k,:] for k in 1:3])
    R = AngleAxis(θ, axis...)
    r3 = similar(x3)
    point_reshaped = reshape(point, 1, 3)
    tmp = Matrix{Float64}(undef,r,3)
    @views for j in 1:nm
        tmp = x3[:,:,j] .- point_reshaped
        mul!(r3[:,:,j], tmp, R')
        r3[:,:,j] .+= point_reshaped
    end
    x2 = reshape(r3,r,c)
end
@btime rotate(randn(20700,63),90,[1 0 0])

4.902 ms (90 allocations: 40.44 MiB)

Presumably you want.= here.


If you go full-on SVector (or Point3f), representing your matrix of 3D points simply using a Matrix{SVector{3, Float64}}, the code becomes much cleaner in my opinion, but apparently slower for reasons I don’t understand.

function rotate_sv(x, θ, axis)
    θ = deg2rad(θ)
    mean_pt = nanmean(x)
    R = AngleAxis(θ, axis...)
    return (Ref(R) .* (x .- Ref(mean_pt))) .+ Ref(mean_pt)
end
julia> θ = 90; axis = SA[1., 0., 0.];

julia> @benchmark rotate(x, $θ, $axis) setup=(x = randn(20700, 63))  # with .=
BenchmarkTools.Trial: 404 samples with 1 evaluation per sample.
 Range (min … max):  5.542 ms … 20.808 ms  ┊ GC (min … max):  0.00% … 72.45%
 Time  (median):     8.297 ms              ┊ GC (median):    30.26%
 Time  (mean ± σ):   8.256 ms ±  1.467 ms  ┊ GC (mean ± σ):  27.20% ± 11.48%

  ▃▃                 ▂██▆▄▁
  ███▇▁▆▁▁▁▁▁▁▁▁▁▁▁▁▁██████▅▇▄▇▅▇▄▅▄▁▁▄▁▁▄▁▁▁▄▄▁▁▁▁▄▁▁▁▁▄▄▅▆ ▇
  5.54 ms      Histogram: log(frequency) by time     13.1 ms <

 Memory estimate: 20.37 MiB, allocs estimate: 21.

julia> @benchmark rotate_sv(x, $θ, $axis) setup=(x = randn(SVector{3, Float64}, 20700, 21))
BenchmarkTools.Trial: 249 samples with 1 evaluation per sample.
 Range (min … max):  12.265 ms … 21.694 ms  ┊ GC (min … max): 0.00% … 11.86%
 Time  (median):     12.631 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   13.424 ms ±  1.329 ms  ┊ GC (mean ± σ):  4.85% ±  6.81%

  ▄█▇▆▃▂                  ▁▅▃▄▁
  ███████▅▅▆▆▆▇▁▆▅▅▁▅▆▁▅▁▅█████▇▁▇▅▆▇▇▆▅▅▅▁▅▁▁▅▅▁▅▁▁▁▁▁▅▁▁▁▁▅ ▆
  12.3 ms      Histogram: log(frequency) by time      17.4 ms <

 Memory estimate: 9.95 MiB, allocs estimate: 3.

Here I used BLAS.set_num_threads(1) for a fair comparison, but it doesn’t seem to make much of a difference.

1 Like

Some small improvements:

function rotate(x,θ,axis)
    θ = (θ*π)/180
    r,c = size(x)
    nm = div(c,3)
    x3 = reshape(x,r,3,nm)
    point = @views nanmean.([x3[:,k,:] for k in 1:3])
    R = AngleAxis(θ, axis...)
    r3 = similar(x3)
    point_reshaped = reshape(point, 1, 3)
    tmp = Matrix{Float64}(undef,r,3)
    @views for j in axes(x3, 3)
        tmp .= x3[:,:,j] .- point_reshaped
        mul!(r3[:,:,j], tmp, R')
        r3[:,:,j] .+= point_reshaped
    end
    x2 = reshape(r3,r,c)
end

It’s also a confounding factor to include the construction of the input vector in the benchmark, so better break it out and also use @btime interpolation. Another small improvement is to give the axis as a tuple.

x=randn(20700, 63);
@btime rotate($x, 90, (1, 0, 0));
1 Like

It seems that *(::AngleAxis, ::SVector{3}) is considerably slower than *(::SMatrix{3, 3}, ::SVector{3}). (Also, using an explicit loop instead of broadcasting seems to be some 10% faster.)

function rotate_sv2(x, θ, axis)
    θ = deg2rad(θ)
    mean_pt = nanmean(x)
    R = SMatrix(AngleAxis(θ, axis...))
    ret = similar(x)
    @inbounds for I = eachindex(x)
        ret[I] = R * (x[I] .- mean_pt) .+ mean_pt
    end
    return ret
end
julia> @benchmark rotate_sv2(x, $θ, $axis) setup=(x = randn(SVector{3, Float64}, 20700, 21))
BenchmarkTools.Trial: 477 samples with 1 evaluation per sample.
 Range (min … max):  2.961 ms … 18.198 ms  ┊ GC (min … max):  0.00% … 83.15%
 Time  (median):     3.078 ms              ┊ GC (median):     0.00%
 Time  (mean ± σ):   3.949 ms ±  1.520 ms  ┊ GC (mean ± σ):  18.68% ± 19.35%

  █▅                 ▅▃
  ██▇▆▁▁▁▁▄▅▁▄▁▁▁▁▁▁███▇▄▁▆▇▄▅▇▁▁▁▄▁▅▄▁▁▁▁▁▁▄▁▁▁▁▄▁▄▁▅▁▁▁▁▄▅ ▆
  2.96 ms      Histogram: log(frequency) by time      9.5 ms <

 Memory estimate: 9.95 MiB, allocs estimate: 3.
1 Like

I had a few minutes so I played around a little bit: you can remove some redundant computations by doing some linear algebra (see rotate3 below), but overall performance didn’t change on my device. You can probably squeeze out more by using more efficient memory access (you are iterating over the third axis now).

using Rotations, NaNStatistics, BenchmarkTools, LinearAlgebra, Random
Random.seed!(42) 

function rotate(x,θ,axis)
    θ = (θ*π)/180
    r,c = size(x)
    nm = div(c,3)
    x3 = reshape(x,r,3,nm)
    point = nanmean.([x3[:,k,:] for k in 1:3])
    R = AngleAxis(θ, axis...)
    r3 = similar(x3)
    point_reshaped = reshape(point, 1, 3)
    tmp = Matrix{Float64}(undef,r,3)
    @views for j in 1:nm
        tmp = x3[:,:,j] .- point_reshaped
        mul!(r3[:,:,j], tmp, R')
        r3[:,:,j] .+= point_reshaped
    end
    x2 = reshape(r3,r,c)
end
function rotate2(x,θ,axis)
    θ = (θ*π)/180
    r,c = size(x)
    nm = div(c,3)
    x3 = reshape(x,r,3,nm)
    point = @views nanmean.([x3[:,k,:] for k in 1:3])
    R = AngleAxis(θ, axis...)
    r3 = similar(x3)
    point_reshaped = reshape(point, 1, 3)
    tmp = Matrix{Float64}(undef,r,3)
    @views for j in axes(x3, 3)
        tmp .= x3[:,:,j] .- point_reshaped
        mul!(r3[:,:,j], tmp, R')
        r3[:,:,j] .+= point_reshaped
    end
    x2 = reshape(r3,r,c)
end
function rotate3(x,θ,axis)
    θ = (θ*π)/180
    r,c = size(x)
    nm = div(c,3)
    x3 = reshape(x,r,3,nm)
    point = @views nanmean.([x3[:,k,:] for k in 1:3])
    R = AngleAxis(θ, axis...)
    r3 = similar(x3)
    tmp = Matrix{Float64}(undef,r,3)
    @views for j in axes(x3, 3)
        tmp .= x3[:,:,j]
        mul!(r3[:,:,j], tmp, R')
    end
    r3 .+= reshape(point, 1, 3) - reshape(point, 1, 3) * R'
    x2 = reshape(r3,r,c)
end

function run_test()
    x = randn(20700,63)

    y1 = rotate(x, 90, [1 0 0]) # warmup
    y2 = rotate2(x, 90, [1 0 0]) # warmup
    y3 = rotate3(x, 90, [1 0 0]) # warmup
    @assert all(y1 .≈ y2 .≈ y3)
    display("all versions match")
    display(@benchmark rotate($x, 90, [1 0 0]))
    display(@benchmark rotate2($x, 90, [1 0 0]))
    display(@benchmark rotate3($x, 90, [1 0 0]))
end
run_test()
1 Like

Can you paste a MWE too? Thanks!

This is ideal, considering the slices x3[:,:,j] and r3[:,:,j] used in the matrix multiplication are then contiguous in memory.

Except for the imports from your code and an extra using StaticArrays, I’m not sure what you’re missing for my code above to already constitute a MWE?

I’m not able to run that code. Is x supposed to be a 2D matrix? nanmean(x) gives a scalar.