Allocation with StaticArrays

I thought I got better at this, but there’s a function that’s creating allocations and I can’t figure out why. I’ve spent a few hours troubleshooting this with no success. Here’s the function:

using StaticArrays
function transformPoint(point,transformationMatrix)
    homogeneousPoint = SA_F64[point[1],point[2],point[3],1.0];
    homogeneousPoint = transformationMatrix*homogeneousPoint;
    return SA_F64[homogeneousPoint[1],homogeneousPoint[2],homogeneousPoint[3]];

where the inputs are a SVector of length 3 and a 4x4 SMatrix. For instance:

transformationMatrix=SA_F64[1 2 3 4; 5 6 7 8; 1 2 3 4; 5 6 7 8];

Any hints as to why this is allocating?
Thanks a lot!

As far as I can tell, it isn’t:

julia> @btime transformPoint($point, $transformationMatrix)
  0.017 ns (0 allocations: 0 bytes)
3-element SVector{3, Float64} with indices SOneTo(3):

Note that the sub-nanosecond timing means that the compiler has optimized away the entire function, which is pretty neat. We can avoid that by hiding the input behind a Ref:

julia> @btime transformPoint(x[], $transformationMatrix) setup=(x = Ref($point))
  1.500 ns (0 allocations: 0 bytes)

You can write this simpler and more general:

function transform(p, M) 
    p1 = push(p, one(eltype(p))  # push(p, true) could work too
    return pop(M * p1) 

Maybe one could also simplify the expressions by using splatting:
homogeneousPoint = SA_F64[point...,1.0]
return SA_F64[homogeneousPoint[1:3]...]

I’m skeptical that this works, did you test it? 1:3 does not tell the compiler about size.

One could use p[SOneTo(3)], but pop is more general in just removing the last element.

It seems to work but it allocates, while the first splatting does not allocate.

Writing the return as follows with views does not allocate:
return @views SVector{3,Float64}(homogeneousPoint[1:3])

Is there any shorter way of writing it?

Not shorter, but more general:

SVector{N, Float64}(ntuple(i->v[i],N)...)

I’ve been using this pattern, but I like the suggestion of @DNF of using push and pop in some cases.

Thanks, pop seems to be the simplest by far and the whole shebang could have been written as:
transformPoint(p, M) = pop(M * SA_F64[p...,1.0])

return homogeneousPoint(SOneTo(3))

1 Like

Yeah, when I first wrote the function and tested it in isolation, it wasn’t. But when I run it within another function I get allocations.
I’m having a hard time reproducing it, but this seems to do it:

function tmp(p1,p2,p3,M)
       return p1,p2,p3
M=SA_F64[1 2 3 4; 5 6 7 8; 1 2 3 4; 5 6 7 8];

@time p1,p2,p3=tmp(p1,p2,p3,M)
  0.000005 seconds (7 allocations: 320 bytes)

As far as I can tell, both of these allocate even worse than mine. My experience with Julia is that fancy short code typically allocates (at least at first) and ugly code (like my function) does not. I avoid splatting and so on at all costs.

No, they don’t allocate anything, the problem is your benchmarking. You cannot use @time for this. Use BenchmarkTools.jl.

No, no, no. This is way off. Write short, elegant code, and splat static arrays and tuples to your heart’s content. Just make sure that you do proper benchmarking with BenchmarkTools.

Don’t write ugly code.

jl> @btime transform(p[], M[]) setup=(p=Ref(@SVector rand(3)); M = Ref(@SMatrix rand(4,4)))
  1.500 ns (0 allocations: 0 bytes)

You’re right, but now I can’t reproduce my issue at all. My function is crazy slow at the moment and I have no idea why. I thought it was this part, but maybe it’s something else altogether. I benchmark it with @btime and see a ton of allocations, but I don’t know where they are coming from. Any ideas on troubleshooting this? I tried @profview, but it just showed that the code was on my main function for a long time, doing nothing (it seemed).

Wish I could, but typically that makes the process 50x slower, as I write nice looking code and then spend a day trying to fix allocations. I need to get smarter before I write pretty functions =.

Which function is slow? transformPoint or something bigger?

The bigger one, that calls transformPoint, but also a bunch of other stuff. It’s a big set of for loops and pretty hard to debug by parts with @btime.

I take it back. It was easy to debug. I just commented out the lines with transformPoint (so the code is not doing the right thing, but it is running) and a crap ton of allocations went away. So transformPoint is the issue, but I don’t know why.

Then there’s some sort of type instability probably. Can you share a bit more of the code?

Also, try @code_warntype

That was my first instinct, but @code_warntype shows everything blue.

This is what the function looks like:

function myfun!(bodies,solver)
    if solver.condition
    else # this is where we are
        Threads.@threads for k in 1:M
            point₁,point₂,point₃ = ...
            for j in 1:N
                for b in 1:B
                    transformationMatrix = solver.Matrices[b];
                    point₁ = transformPoint(point₁,transformationMatrix);
					point₂ = transformPoint(point₂,transformationMatrix);
                    point₃ = transformPoint(point₃,transformationMatrix);
    return nothing;

Unfortunately I can’t easily share something easy to reproduce.

Check @code_warntype without @threads since @threads will hide its body since it gets pulled out in a closure

julia> function f()
           Threads.@threads for i in 1:5
               x = rand() > 0.5 ? 1 : "1" # type unstable
f (generic function with 1 method)

julia> @code_warntype f() # does not show any type instability

julia> function g()
           for i in 1:5
               x = rand() > 0.5 ? 1 : "1"
g (generic function with 1 method)

julia> @code_warntype g() # shows type unstable
  @_2::Union{Nothing, Tuple{Int64, Int64}}
  x::Union{Int64, String}
  @_5::Union{Int64, String}
1 Like

Thanks a lot! This finally pointed me towards the issue, which as @DNF assumed, is a type instability problem.
Basically, my matrix was defined as Vector{StaticArrays.SMatrix{4,4,Float64}}, which is not good enough, apparently. I switched to Vector{StaticArrays.SMatrix{4,4,Float64,16}} and it works!
Thanks again, everyone!

1 Like