Disabling allocations

anon56330260 · December 10, 2020, 1:28pm

But if you don’t know what the type of T is exactly, then it must be HitRecord{T} where T. Demonstration of what I said in your example:

julia> @code_warntype f(v)
Variables
  #self#::Core.Compiler.Const(f, false)
  v::Array{Baz,1}
  res::Float64
  @_4::Union{Nothing, Tuple{Baz,Int64}}
  z::Baz

Here type of z is Baz, and isconcretetype(Baz)==false

That’s the whole problem. There is little discussion about how to handle dynamic dispatch if dynamic dispatch is unavoidable. The common suggestions are union splitting and if-elseif-else to ensure type stability. However, when you write a (non-toy project) raytracer, dispatch at runtime is unavoidable. What should one do to avoid bad performance is unclear (for a normal user). Since there is no clear guideline and I don’t know how to solve these problems, I have to give up.

lmiq · December 10, 2020, 1:37pm

But how another language deals with that kind of data structure in a more performant manner? (a link for me to follow is appreciated).

Because I come for Fortran, and in Fortran certainly that kind of problem is rare. But not because Fortran allows something that Julia does not, but because writing a vector of mixed types in Fortran is so unnatural that one would rethink the data structure from start to not do that. And that, of course, can be done in Julia in the exact same way.

aplavin · December 10, 2020, 1:44pm

There is a package GitHub - SciML/RecursiveArrayTools.jl: Tools for easily handling objects like arrays of arrays and deeper nestings in scientific machine learning (SciML) and other applications that has something like a fast heterogenous array:

ArrayPartition(x::AbstractArray…)
An ArrayPartition A is an array, which is made up of different arrays A.x. These index like a single array, but each subarray may have a different type. However, broadcast is overloaded to loop in an efficient manner, meaning that A .+= 2.+B is type-stable in its computations, even if A.x[i] and A.x[j] do not match types. A full array interface is included for completeness, which allows this array type to be used in place of a standard array where such a type stable broadcast may be needed. One example is in heterogeneous differential equations for DifferentialEquations.jl.

An ArrayPartition acts like a single array. A[i] indexes through the first array, then the second, etc., all linearly. But A.x is where the arrays are stored. Thus, for:

using RecursiveArrayTools
A = ArrayPartition(y,z)
we would have A.x[1]==y and A.x[2]==z. Broadcasting like f.(A) is efficient.

I planned to use it myself at some point, but turned out I don’t need this. The package may fit your needs as well.

Skoffer · December 10, 2020, 2:00pm

If you read @code_warntype further you will see that despite z is of the type Baz all calculations are type stable and it is supported by the fact that there are no allocations in @btime benchmark. That is the whole idea of union splitting, you take some non concrete type and split into concrete branches. When compiler knows that it is in this or that branch it know exactly which method to apply, so no dynamic dispatch is happening and as a result there are no allocations.

However, when you write a (non-toy project) raytracer, dispatch at runtime is unavoidable. What should one do to avoid bad performance is unclear (for a normal user).

Well, as I have said, you should try union splitting. It’s rather straightforward and removes the problem of dynamical dispatch and it helps you to avoid bad performance.

On the other hand, a proper data structure or type stable code can improve performance much better than union splitting or other fancy techniques. For example, if you can store all of your materials in a structure like [Material1[], Material2[], ...] i.e. as a heterogeneous vector of homogeneous arrays, then code performance will be improved and you avoid dynamic dispatch problem completely.

anon56330260 · December 10, 2020, 2:33pm

quote answer from Loki Asteri:

A virtual method call has a slightly higher cost as you need to look up the method to call before you call it (but this is a simple table look up not a search).

This is different for dynamic dispatch. Dynamic dispatch needs to search the methods table.

Yea and no. Even you can do something to get around this problem, it doesn’t scale well… The true problem occurs when I try to extend my render to support more materials, more textures and more lights.Another problem is that, currently we have no function types in Julia. In C language, we can use function pointer and lookup table to mimic C++'s single dispatch, which is still impossible in Julia(unless you use the FunctionWrapper).

lmiq · December 10, 2020, 2:45pm

Thank you for your reply! I never tried to implement a ray tracer. But if I was to start that in Fortran probably I would start building independent lists of materials, rays, textures, and work with linked lists to deal of the combinations over which the calculations have to be performed. That can be done in Julia as well, but I can imagine how complex that can become and that there might more elegant solutions.

anon56330260 · December 10, 2020, 3:05pm

This is what I have done in the toy project. I currently have only one light type, one material type and three shape types. I do union splitting on shape types. But I want to get rid of this approach since it’s not extensible.

The problem here is not “avoid bad performance”. I already know all the tricks here to ensure type stability. The problem is to get a equal performance of C++/C. So if you think union splitting can solve all this proplems, then it should easy to answer these questions:
1.If I have a lot of material types, then is union splitting faster or slower than single dispatch in C++? If it’s slower, then how would it impact the performance of the whole raytracer?(significant or little impact?)
2.If I use a isa to check whether a value belongs to a specific type, what’s the complexity and cost of it? Is isa a trivial function that has almost no cost?How is the cost of it, compared to a linear lookup table and switch clause in C?
3.If I want to ensure that all the abstract types are splitted into concrete types before the computation, what should I do if I have multiple functions depending on a dynamic type which are invoked during different stages of rendering? For example:

function first_stage()
    #use material to calculate reflect direction
end
function second_stage()
   #use material to calculate transmission direction
end
function third_stage()
   #use materials and textures to calculate colors of light
end
function raytracers()
    first_stage()
    second_stage()
    third_stage()
end

So to use union splitting, I have to couple all the functions together, or do union splitting at every position where it’s used?

Skoffer · December 10, 2020, 3:19pm

I do not know, one should do benchmark to compare it with C/C++. And it doesn’t make sense to compare with single dispatch, you should compare performance of all code, since Julia compiler may apply additional tricks which can mitigate the effect of if branching. Of course, general rule that Julia code is 1.2-1.3 slower than it’s C/C++ counterpart, so you should keep that in mind, while benchmarking.
AFAIK, it’s very fast for concrete types, just the time of pointer comparison. But you should look at the Julia internals to get additional information.
In your example there are no types at all, so it’s hard to say. If it can be rewritten as

function raytracers()
   material = getmaterialfromscene()
   first_stage(material)
   second_stage(material)
   third_stage(material)
end

then you can do something like this

function stages(material)
   first_stage(material)
   second_stage(material)
   third_stage(material)
end

function raytracers()
   material = getmaterialfromscene()
   if material isa Material1
     stages(material)
   # continue with union splitting
   # ...
end

lmiq · December 10, 2020, 4:39pm

If one has hundreds of materials, and I suppose many functions which will have to be split like this, I guess having a list of the materials and writing macros that generate the codes for these functions would be one practical way to deal with this, isn’t it?

Skoffer · December 10, 2020, 6:33pm

Yes, macro is a reasonable way to generalize this approach. You can materialize conditions for all subtypes of the given abstract type, so in the end one can reach rather compact form.

Another thing occurred to me. If some types require function, for example, the material may define diffusion function, then there is a problem, that Julia does not dispatch on functions. But you can avoid it by using functors. I think it’s not a big problem to write a macro, that can transform something like

@material begin
a::Int 
b::Float32
abc_diffusion(x) = # some function of parameters a, b and value of x
end

to

struct Material_abc <: AbstractMaterial
   a::Int
   b::Float32
end
function (m::Material_abc)(x)
  # some function of m.a, m.b and x
end

Now you can dispatch on Material_abc and use it freely in scene rendering as usual function. With proper union splitting macro there is even no need to keep track of all types that were added.

lmiq · December 10, 2020, 7:39pm

Very interesting. Using a simple eval did the trick:

abstract type Material end

struct Material1 <: Material
  m :: Float64
end

struct Material2 <: Material
  m :: Float64
end

struct HitPoint{T <: Material}
  p :: Float64
  r :: Float64
  m :: T
end

# the hit function
hit(p <: HitPoint) = p.r*p.p*p.m.m

# generate functors for each type
for T in subtypes(Material)
  eval(:((p::HitPoint{$T})() = p.r*p.p*p.m.m))
end

# naive sum
function hits(hitpoints)
  s = 0.
  for p in hitpoints 
    s += hit(p)
  end
  s
end

# with union spliting
function hits2(hitpoints)
  s = 0.
  for p in hitpoints
    if p isa HitPoint{Material1}
      s += hit(p)
    elseif p isa HitPoint{Material2}
      s += hit(p)
    end
  end
  s
end

# with functors
function hits3(hitpoints)
  s = 0.
  for p in hitpoints 
    s += p()
  end
  s
end

Benchmark:

julia> n = 1000
julia> hitpoints = [ isodd(i) ? HitPoint(rand(),rand(),Material1(rand())) : 
              HitPoint(rand(),rand(),Material2(rand())) for i in 1:n ]

julia> @btime hits($hitpoints)
  25.215 μs (2000 allocations: 31.25 KiB)
126.63951422494503

julia> @btime hits2($hitpoints)
  1.137 μs (0 allocations: 0 bytes)
126.63951422494503

julia> @btime hits3($hitpoints)
  1.254 μs (0 allocations: 0 bytes)
126.63951422494503

If one creates a vector of the elements of the same type, one can more or less estimate the overhead of these approaches:

julia> n = 1000;

julia> hitpoints_single = HitPoint{Material1}[ HitPoint(rand(),rand(),Material1(rand())) for i in 1:n ];

julia> @btime hits($hitpoints_single)
  991.833 ns (0 allocations: 0 bytes)
125.72367862942023

julia> @btime hits3($hitpoints_single)
  992.250 ns (0 allocations: 0 bytes)
125.72367862942023

Dealing with the mixed-type array takes, in this example, 15% more time with the union spliting or functor approaches, and all approaches are the same in this case.

Thus the functor approach behaves as nicely as the union splitting.

I apologize for intervening in the discussion without too much to contribute. I think I learnt a lot, and I am grateful to you all for the patience.

Topic		Replies	Views
Option to completely disallow dynamic allocation Internals & Design	5	1246	January 13, 2017
Avoid allocation for local array General Usage	6	101	January 9, 2025
Memory on array element assignment Performance	10	442	August 3, 2022
What exactly is "allocation" in Julia? Performance question , memory-allocation	45	6032	November 4, 2022
To avoid allocations, avoid anonymous functions? New to Julia memory-allocation	5	1316	March 2, 2018

Disabling allocations

Related topics