Allocations on accessing struct

Hi

I was trying out a few different ways a model I am working one. The baseline is just 3xN array of floats to represent x,y,z coordinates of N points. Next I tried to create a Point-type to describe the point and use a Vector of Points to represent the N points. I have a case where I need to select some points based on the Z-coordinate. In my example the vector of Points is significantly slower than using a 3xN array. Also a lot more allocations. I was expecting some overhead on the Point-type solution but nothing like I am seeing. What am I missing?

87.450 μs (4 allocations: 28.70 KiB)
3.016 ms (200012 allocations: 3.08 MiB)

Code:

using BenchmarkTools

struct Point
    x
    y
    z
end

function selectionBelow(p::Point, val)
    p.z < val
end

function selectionBelow(A, val)
    B = view(A,3,:)
    B .< val
end

# Generate data
A = rand(3, 200000)
points = Vector{Point}(undef,size(A,2))
for idx in axes(A,2)
    points[idx] = Point(A[1,idx], A[2,idx], A[3,idx])
end

# Evaluate
tol = 0.3
sel1 = @btime selectionBelow(A, tol);
sel2 = @btime selectionBelow.(points, tol);

Your struct members have type Any, which means they cannot be stored inline, and that efficient specialized code cannot be generated, since the fields can be of any type whatsoever. You must give the fields a concrete type, or a parametric type.

1 Like

You need to type annotate your struct:

struct Point
    x::Float64
    y::Float64
    z::Float64
end

Or, if you want to be able to use points with different data types, try

struct Point{T}
    x::T
    y::T
    z::T
end

Than way you can create a point of ints: my_point = Point(1,2,3)or a point of floats: my_point = Point(1.0,2.0,3.0)

4 Likes

Thank you both. Now the Point-type version comes out faster than the array version.

1 Like

Also, use a dollar sign in front of A or Points in the benchmark call to benchmark just the selection.

1 Like

Thank you for the suggestion. I am not sure if it makes sense for my benchmark, in any real application I would need to pass data to the function. I usually see the $ (dollar sign) in cases where the benchmark call includes a call to rand(). Then to avoid the benchmarking to include rand() they use the dollar sign. Am I missing something?

o, it isn’t just for rand
With the $ you say to treat the variable as a local variable (as will be in the actual implementation), even if you are benchmarking on the global scope…

1 Like