Disabling allocations

lmiq · December 10, 2020, 7:39pm

Very interesting. Using a simple eval did the trick:

abstract type Material end

struct Material1 <: Material
  m :: Float64
end

struct Material2 <: Material
  m :: Float64
end

struct HitPoint{T <: Material}
  p :: Float64
  r :: Float64
  m :: T
end

# the hit function
hit(p <: HitPoint) = p.r*p.p*p.m.m

# generate functors for each type
for T in subtypes(Material)
  eval(:((p::HitPoint{$T})() = p.r*p.p*p.m.m))
end

# naive sum
function hits(hitpoints)
  s = 0.
  for p in hitpoints 
    s += hit(p)
  end
  s
end

# with union spliting
function hits2(hitpoints)
  s = 0.
  for p in hitpoints
    if p isa HitPoint{Material1}
      s += hit(p)
    elseif p isa HitPoint{Material2}
      s += hit(p)
    end
  end
  s
end

# with functors
function hits3(hitpoints)
  s = 0.
  for p in hitpoints 
    s += p()
  end
  s
end

Benchmark:

julia> n = 1000
julia> hitpoints = [ isodd(i) ? HitPoint(rand(),rand(),Material1(rand())) : 
              HitPoint(rand(),rand(),Material2(rand())) for i in 1:n ]

julia> @btime hits($hitpoints)
  25.215 μs (2000 allocations: 31.25 KiB)
126.63951422494503

julia> @btime hits2($hitpoints)
  1.137 μs (0 allocations: 0 bytes)
126.63951422494503

julia> @btime hits3($hitpoints)
  1.254 μs (0 allocations: 0 bytes)
126.63951422494503

If one creates a vector of the elements of the same type, one can more or less estimate the overhead of these approaches:

julia> n = 1000;

julia> hitpoints_single = HitPoint{Material1}[ HitPoint(rand(),rand(),Material1(rand())) for i in 1:n ];

julia> @btime hits($hitpoints_single)
  991.833 ns (0 allocations: 0 bytes)
125.72367862942023

julia> @btime hits3($hitpoints_single)
  992.250 ns (0 allocations: 0 bytes)
125.72367862942023

Dealing with the mixed-type array takes, in this example, 15% more time with the union spliting or functor approaches, and all approaches are the same in this case.

Thus the functor approach behaves as nicely as the union splitting.

I apologize for intervening in the discussion without too much to contribute. I think I learnt a lot, and I am grateful to you all for the patience.

Topic		Replies	Views
Best approach for runtime dispatching inside a hot loop (heterogeneous tree structure) Performance	18	2767	March 3, 2018
Allocation and slow down when # of types involved increase (slower than C++ virtual methods) Performance question , hep	57	2829	November 9, 2022
Union splitting vs C++ Performance	22	4218	July 9, 2021
I translated Peter Shirley's Raytracer to Julia. C++ 1min -> Julia 2m30s -> 15s (@threads) Visualization images , graphics	27	3194	February 21, 2022
Performance drawback with subtyping Performance	34	3201	August 26, 2021

Disabling allocations

Related topics