3D point rotation

mahmah · December 30, 2025, 9:03pm

I have a 2D array x of size (r, c), where c = 3*nm, so stacked 3D points. I’m able to rotate it around axis based on some θ value, but this code is not flying very fast. I tried using Quaternions for this but I didn’t find an efficient way either. Any ideas? Thanks!

θ = (θ*π)/180
r,c = size(x)
nm = ndims
x3 = reshape(x,r,3,nm)
point = nanmean.([x3[:,k,:] for k in 1:3])
R = AngleAxis(θ, axis...)
r3 = similar(x3)
point_reshaped = reshape(point, 1, 3)
for j in 1:nm
r3[:,:,j] = (R * (x3[:,:,j] .- point_reshaped)')' .+ point'
end
x2 = reshape(r3,r,c)

JADekker · December 30, 2025, 9:16pm

Hi, welcome! I’m not familiar with AngleAxis and related and I think it would be beneficial to make your example into a MWE with imports etc so others can chime in with more detailed tips, but I wanted to point you to the performance tips: Performance Tips · The Julia Language.

In particular here: use views instead of slice indexing, and use .= when writing to r3

juliohm · December 30, 2025, 9:19pm

Did you try Rotations.jl with StaticArrays.jl? It is usually better to store each vector as a static vector, and collect all vectors into a dynamic vector.

JADekker · December 30, 2025, 9:50pm

By the way, I also see a double transpose in there: you may wish to transpose R, point and point_reshaped instead together with some linear algebra to avoid the transposing altogether.

mahmah · December 31, 2025, 7:46am

Thanks. Here’s a MWE:

using Rotations,NaNStatistics,BenchmarkTools
function rotate(x,θ,axis)
θ = (θ*π)/180
r,c = size(x)
nm = div(c,3)
x3 = reshape(x,r,3,nm)
point = nanmean.([x3[:,k,:] for k in 1:3])
R = AngleAxis(θ, axis...)
r3 = similar(x3)
point_reshaped = reshape(point, 1, 3)
for j in 1:nm
r3[:,:,j] = (R * (x3[:,:,j] .- point_reshaped)')' .+ point'
end
x2 = reshape(r3,r,c)
end
@btime rotate(randn(20700,63),90,[1 0 0])

68.665 ms (279 allocations: 71.94 MiB)
I tried using StaticArrays.jl, Quaternions.jl (Rotate a vector with a quaternion), inbounds, @views, but the code above was the fastest. Moar performance is needed!

mahmah · December 31, 2025, 9:23am

OK, this is a bit faster already:

using Rotations,NaNStatistics,BenchmarkTools,LinearAlgebra
function rotate(x,θ,axis)
    θ = (θ*π)/180
    r,c = size(x)
    nm = div(c,3)
    x3 = reshape(x,r,3,nm)
    point = nanmean.([x3[:,k,:] for k in 1:3])
    R = AngleAxis(θ, axis...)
    r3 = similar(x3)
    point_reshaped = reshape(point, 1, 3)
    tmp = Matrix{Float64}(undef,r,3)
    @views for j in 1:nm
        tmp = x3[:,:,j] .- point_reshaped
        mul!(r3[:,:,j], tmp, R')
        r3[:,:,j] .+= point_reshaped
    end
    x2 = reshape(r3,r,c)
end
@btime rotate(randn(20700,63),90,[1 0 0])

4.902 ms (90 allocations: 40.44 MiB)

eldee · December 31, 2025, 10:07am

Presumably you want.= here.

If you go full-on SVector (or Point3f), representing your matrix of 3D points simply using a Matrix{SVector{3, Float64}}, the code becomes much cleaner in my opinion, but apparently slower for reasons I don’t understand.

function rotate_sv(x, θ, axis)
    θ = deg2rad(θ)
    mean_pt = nanmean(x)
    R = AngleAxis(θ, axis...)
    return (Ref(R) .* (x .- Ref(mean_pt))) .+ Ref(mean_pt)
end

julia> θ = 90; axis = SA[1., 0., 0.];

julia> @benchmark rotate(x, $θ, $axis) setup=(x = randn(20700, 63))  # with .=
BenchmarkTools.Trial: 404 samples with 1 evaluation per sample.
 Range (min … max):  5.542 ms … 20.808 ms  ┊ GC (min … max):  0.00% … 72.45%
 Time  (median):     8.297 ms              ┊ GC (median):    30.26%
 Time  (mean ± σ):   8.256 ms ±  1.467 ms  ┊ GC (mean ± σ):  27.20% ± 11.48%

  ▃▃                 ▂██▆▄▁
  ███▇▁▆▁▁▁▁▁▁▁▁▁▁▁▁▁██████▅▇▄▇▅▇▄▅▄▁▁▄▁▁▄▁▁▁▄▄▁▁▁▁▄▁▁▁▁▄▄▅▆ ▇
  5.54 ms      Histogram: log(frequency) by time     13.1 ms <

 Memory estimate: 20.37 MiB, allocs estimate: 21.

julia> @benchmark rotate_sv(x, $θ, $axis) setup=(x = randn(SVector{3, Float64}, 20700, 21))
BenchmarkTools.Trial: 249 samples with 1 evaluation per sample.
 Range (min … max):  12.265 ms … 21.694 ms  ┊ GC (min … max): 0.00% … 11.86%
 Time  (median):     12.631 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   13.424 ms ±  1.329 ms  ┊ GC (mean ± σ):  4.85% ±  6.81%

  ▄█▇▆▃▂                  ▁▅▃▄▁
  ███████▅▅▆▆▆▇▁▆▅▅▁▅▆▁▅▁▅█████▇▁▇▅▆▇▇▆▅▅▅▁▅▁▁▅▅▁▅▁▁▁▁▁▅▁▁▁▁▅ ▆
  12.3 ms      Histogram: log(frequency) by time      17.4 ms <

 Memory estimate: 9.95 MiB, allocs estimate: 3.

Here I used BLAS.set_num_threads(1) for a fair comparison, but it doesn’t seem to make much of a difference.

GunnarFarneback · December 31, 2025, 10:36am

Some small improvements:

function rotate(x,θ,axis)
    θ = (θ*π)/180
    r,c = size(x)
    nm = div(c,3)
    x3 = reshape(x,r,3,nm)
    point = @views nanmean.([x3[:,k,:] for k in 1:3])
    R = AngleAxis(θ, axis...)
    r3 = similar(x3)
    point_reshaped = reshape(point, 1, 3)
    tmp = Matrix{Float64}(undef,r,3)
    @views for j in axes(x3, 3)
        tmp .= x3[:,:,j] .- point_reshaped
        mul!(r3[:,:,j], tmp, R')
        r3[:,:,j] .+= point_reshaped
    end
    x2 = reshape(r3,r,c)
end

It’s also a confounding factor to include the construction of the input vector in the benchmark, so better break it out and also use @btime interpolation. Another small improvement is to give the axis as a tuple.

x=randn(20700, 63);
@btime rotate($x, 90, (1, 0, 0));

eldee · December 31, 2025, 10:51am

It seems that *(::AngleAxis, ::SVector{3}) is considerably slower than *(::SMatrix{3, 3}, ::SVector{3}). (Also, using an explicit loop instead of broadcasting seems to be some 10% faster.)

function rotate_sv2(x, θ, axis)
    θ = deg2rad(θ)
    mean_pt = nanmean(x)
    R = SMatrix(AngleAxis(θ, axis...))
    ret = similar(x)
    @inbounds for I = eachindex(x)
        ret[I] = R * (x[I] .- mean_pt) .+ mean_pt
    end
    return ret
end

julia> @benchmark rotate_sv2(x, $θ, $axis) setup=(x = randn(SVector{3, Float64}, 20700, 21))
BenchmarkTools.Trial: 477 samples with 1 evaluation per sample.
 Range (min … max):  2.961 ms … 18.198 ms  ┊ GC (min … max):  0.00% … 83.15%
 Time  (median):     3.078 ms              ┊ GC (median):     0.00%
 Time  (mean ± σ):   3.949 ms ±  1.520 ms  ┊ GC (mean ± σ):  18.68% ± 19.35%

  █▅                 ▅▃
  ██▇▆▁▁▁▁▄▅▁▄▁▁▁▁▁▁███▇▄▁▆▇▄▅▇▁▁▁▄▁▅▄▁▁▁▁▁▁▄▁▁▁▁▄▁▄▁▅▁▁▁▁▄▅ ▆
  2.96 ms      Histogram: log(frequency) by time      9.5 ms <

 Memory estimate: 9.95 MiB, allocs estimate: 3.

JADekker · December 31, 2025, 11:13am

I had a few minutes so I played around a little bit: you can remove some redundant computations by doing some linear algebra (see rotate3 below), but overall performance didn’t change on my device. You can probably squeeze out more by using more efficient memory access (you are iterating over the third axis now).

using Rotations, NaNStatistics, BenchmarkTools, LinearAlgebra, Random
Random.seed!(42) 

function rotate(x,θ,axis)
    θ = (θ*π)/180
    r,c = size(x)
    nm = div(c,3)
    x3 = reshape(x,r,3,nm)
    point = nanmean.([x3[:,k,:] for k in 1:3])
    R = AngleAxis(θ, axis...)
    r3 = similar(x3)
    point_reshaped = reshape(point, 1, 3)
    tmp = Matrix{Float64}(undef,r,3)
    @views for j in 1:nm
        tmp = x3[:,:,j] .- point_reshaped
        mul!(r3[:,:,j], tmp, R')
        r3[:,:,j] .+= point_reshaped
    end
    x2 = reshape(r3,r,c)
end
function rotate2(x,θ,axis)
    θ = (θ*π)/180
    r,c = size(x)
    nm = div(c,3)
    x3 = reshape(x,r,3,nm)
    point = @views nanmean.([x3[:,k,:] for k in 1:3])
    R = AngleAxis(θ, axis...)
    r3 = similar(x3)
    point_reshaped = reshape(point, 1, 3)
    tmp = Matrix{Float64}(undef,r,3)
    @views for j in axes(x3, 3)
        tmp .= x3[:,:,j] .- point_reshaped
        mul!(r3[:,:,j], tmp, R')
        r3[:,:,j] .+= point_reshaped
    end
    x2 = reshape(r3,r,c)
end
function rotate3(x,θ,axis)
    θ = (θ*π)/180
    r,c = size(x)
    nm = div(c,3)
    x3 = reshape(x,r,3,nm)
    point = @views nanmean.([x3[:,k,:] for k in 1:3])
    R = AngleAxis(θ, axis...)
    r3 = similar(x3)
    tmp = Matrix{Float64}(undef,r,3)
    @views for j in axes(x3, 3)
        tmp .= x3[:,:,j]
        mul!(r3[:,:,j], tmp, R')
    end
    r3 .+= reshape(point, 1, 3) - reshape(point, 1, 3) * R'
    x2 = reshape(r3,r,c)
end

function run_test()
    x = randn(20700,63)

    y1 = rotate(x, 90, [1 0 0]) # warmup
    y2 = rotate2(x, 90, [1 0 0]) # warmup
    y3 = rotate3(x, 90, [1 0 0]) # warmup
    @assert all(y1 .≈ y2 .≈ y3)
    display("all versions match")
    display(@benchmark rotate($x, 90, [1 0 0]))
    display(@benchmark rotate2($x, 90, [1 0 0]))
    display(@benchmark rotate3($x, 90, [1 0 0]))
end
run_test()

mahmah · December 31, 2025, 11:35am

Can you paste a MWE too? Thanks!

eldee · December 31, 2025, 11:35am

This is ideal, considering the slices x3[:,:,j] and r3[:,:,j] used in the matrix multiplication are then contiguous in memory.

eldee · December 31, 2025, 11:43am

Except for the imports from your code and an extra using StaticArrays, I’m not sure what you’re missing for my code above to already constitute a MWE?

mahmah · December 31, 2025, 12:06pm

I’m not able to run that code. Is x supposed to be a 2D matrix? nanmean(x) gives a scalar.

eldee · December 31, 2025, 12:35pm

Well, x is a Matrix{SVector{3, Float64}}, like the ones created in the @benchmark’s setup:

x = randn(SVector{3, Float64}, 20700, 21)

Note in particular that the memory order is different between your x in rotate and mine in rotate_sv/rotate_sv2 (i.e.

julia> size(reinterpret(Float64, x))  # with x the `Matrix{SVector}` above
(62100, 21)

julia> size(reinterpret(reshape, Float64, x))
(3, 20700, 21)

compared to (20700, 63) and (20700, 3, 21) in your code).

JADekker · December 31, 2025, 12:48pm

Do you mean that this is contiguous because of the reshape? Usually indexing over the third index is not contiguous, right?

eldee · December 31, 2025, 1:35pm

No, in general slicing along the last dimensions is in memory order for column-major ordering. For example,

julia> x = reshape(1:6, 2, 3)
2×3 reshape(::UnitRange{Int64}, 2, 3) with eltype Int64:
 1  3  5
 2  4  6

julia> x[:, 2]  # contiguous
2-element Vector{Int64}:
 3
 4

julia> x[2, :]  # non-contiguous
3-element Vector{Int64}:
 2
 4
 6

It is true that (normally) in nested loops you should consequently go ‘from right to left’

for k in axes(r3, 3)
    for j in axes(r3, 2)
        for i in axes(r3, 1)
            do_stuff_with(r3[i, j, k])
        end
    end
end

so that the most rapidly changing variable (i) is the left-most index in the array. If you want, you can interpret the mul!(r3[:,:,j], tmp, R') call as an inner loop over the first two dimensions.

mahmah · December 31, 2025, 11:14pm

Thanks! It seems that rotate3 and rotate_sv2 are the fastest options. So @eldee, how can I convert an existing 2D matrix (say 20700x21) into a format that can be used with that function?

Happy new year!

eldee · January 1, 2026, 8:24pm

You could convert an x::Matrix{Float64} of size (r, 3nm) into xsv::Matrix{SVector{3, Float64}} of size (r, nm) via e.g.

xsv = SVector{3, Float64}.(eachslice(reshape(x, size(x, 1), 3, :), dims=(1, 3)))

but note that this conversion does not come for free:

julia> x = randn(20700, 63);

julia> @benchmark SVector{3, Float64}.(eachslice(reshape($x, size($x, 1), 3, :), dims=(1, 3)))
BenchmarkTools.Trial: 1780 samples with 1 evaluation per sample.
 Range (min … max):  1.910 ms … 17.696 ms  ┊ GC (min … max):  0.00% … 88.78%
 Time  (median):     2.047 ms              ┊ GC (median):     0.00%
 Time  (mean ± σ):   2.800 ms ±  1.324 ms  ┊ GC (mean ± σ):  26.49% ± 25.22%

  █▇▆▄▃▂▁                              ▁▃▁▁   ▃▃▃▂▁▁▁        ▁
  ████████▇▇▇█▇▅▆▄▄▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▇█████▇████████▇▇▇▇▆▆▆ █
  1.91 ms      Histogram: log(frequency) by time     5.59 ms <

 Memory estimate: 9.95 MiB, allocs estimate: 4.

(Constructing xsv by explicitly looping is equally fast, while using permutedims and reinterpret is slower.)

So I would advise you to try to work with this representation from the start, if possible. Most of the time it also makes the most sense to put the xyz-axis in the first dimension, either explicitly as a (3, r, nm) Array or implicitly via SVector{3}s (which is exactly the same in memory, just one reinterpret away).

You could of course also use JADekker’s trick with SVectors, but also here I don’t see any performance improvement.

Exploiting distributivity

julia> function rotate_sv3(x, θ, axis)
           θ = deg2rad(θ)
           mean_pt = nanmean(x)
           R = SMatrix(AngleAxis(θ, axis...))
           total_mean_pt_shift = mean_pt .- R * mean_pt
           ret = similar(x)
           @inbounds for I = eachindex(x)
               ret[I] = R * x[I] .+ total_mean_pt_shift
           end
           return ret
       end
rotate_sv3 (generic function with 1 method)

julia> @benchmark rotate_sv3(x, $θ, $axis) setup=(x = randn(SVector{3, Float64}, 20700, 21))
BenchmarkTools.Trial: 481 samples with 1 evaluation per sample.
 Range (min … max):  2.920 ms … 11.639 ms  ┊ GC (min … max):  0.00% … 71.51%
 Time  (median):     3.085 ms              ┊ GC (median):     0.00%
 Time  (mean ± σ):   3.722 ms ±  1.069 ms  ┊ GC (mean ± σ):  17.26% ± 18.56%

  ██▅▄▂▃▁                         ▅▄▃▂▁▁
  ███████▆▇▄▁▄▅▄▁▅▆▄▁▁▁▁▁▁▁▁▁▁▁▁▁▄██████▇▇▇▄▁▅▁▄▁▆▄▁▄▁▁▅▄▄▁▄ ▇
  2.92 ms      Histogram: log(frequency) by time     6.42 ms <

 Memory estimate: 9.95 MiB, allocs estimate: 3.

Happy New Year!

mahmah · January 2, 2026, 7:59am

Inded, the difference in performance between sv2 and sv3 is negligible. I’m sticking with the 2D array for pedagogical reasons.

Thanks to all for the great advice!

Topic		Replies	Views
Very slow execution time in comparison even to Python Performance	35	2189	October 29, 2020
Trouble Pre-allocating Outputs Performance	9	824	September 16, 2020
How can I implement the TRIAD algorithm in Julia General Usage question , aerospace , rotations	32	489	October 11, 2024
Julia-ism for two-dimensional map? New to Julia	22	3253	May 1, 2018
Element by element wise multiplication and addition/subtraction of 3d matrices Numerics question	19	1205	June 14, 2023

3D point rotation

Related topics