Performance drawback with subtyping

Hello,

I’ve a problem with subtype, see the code:

abstract type LineAbstract end

mutable struct LineA <: LineAbstract
      color::String
end

mutable struct LineB <: LineAbstract
      length::Int
end

mutable struct Picture
       lines::Vector{LineAbstract}
end 

I’m putting here a conceptual problem of my real problem because it’s more easy to understand.
The class Picture have lines, and they share some attributes in common, but the have it’s own.
In class Picture I have all kinds of lines at only one attribute lines, so I’ve this option above and the second one:

mutable struct Picture
       lines::Vector{Union{LineA, LineB}}
end 

Both are have bad performance, because first use abstract type and the second use union acoording with https://docs.julialang.org/en/v1/manual/performance-tips.
I can’t split the property lines in linesA and linesB.

How can I deal with it?

Sample of drawbacks:

la1 = LineA("blue");
la2 = LineA("yellow");

lb1 = LineB(10);
lb2 = LineB(50);

p = Picture([la1, la2, lb1, lb2]);

function paint(p)
       for l in p.lines
              paint(l)
       end
end

paint(l::L) where L <: LineAbstract = println("Painting a line");

First Case Drawback Problem

Something similar happen with the second case.

> @code_warntype paint(p)

Variables
  #self#::Core.Compiler.Const(paint, false)
  p::Picture
  @_3::Union{Nothing, Tuple{LineAbstract,Int64}}
  l::LineAbstract

Body::Nothing
1 ─ %1  = Base.getproperty(p, :lines)::Array{LineAbstract,1}
│         (@_3 = Base.iterate(%1))
│   %3  = (@_3 === nothing)::Bool
│   %4  = Base.not_int(%3)::Bool
└──       goto #4 if not %4
2 ┄ %6  = @_3::Tuple{LineAbstract,Int64}::Tuple{LineAbstract,Int64}
│         (l = Core.getfield(%6, 1))
│   %8  = Core.getfield(%6, 2)::Int64
│         Main.paint(l)
│         (@_3 = Base.iterate(%1, %8))
│   %11 = (@_3 === nothing)::Bool
│   %12 = Base.not_int(%11)::Bool
└──       goto #4 if not %12
3 ─       goto #2
4 ┄       return

Thank you a lot.

2 Likes

Use a type parameter:

mutable struct Picture{T<:LineAbstract}
       lines::Vector{T}
end 
3 Likes

Hello @mauro3,

Thanks for you answer. But with type parameter I’ve only one type of Line at lines attribute. I need both, lines type A, B, etc.

I thought a ugly solution, someone have a nicer?

@enum LineType A B 

mutable struct DataA
    color::String
end

mutable struct DataB
    length::Int
end

mutable struct Line
    t::LineType
    dataA::DataA
    dataB::DataB
    Line(t) = new(t)
end

mutable struct Picture
    lines::Vector{Line}
end

I created a Line type with all kind of data dataA, dataB, etc. So, I facaded the constructor with:

function LineA(color)
    l = Line(A)
    l.dataA = DataA(color)
    return l
end

function LineB(length)
    l = Line(B)
    l.dataB = DataB(length)
    return l
end

So that, the use remains the same:

la1 = LineA("blue");
la2 = LineA("yellow");

lb1 = LineB(10);
lb2 = LineB(50);

p = Picture([la1, la2, lb1, lb2]);

function paint(p)
       for l in p.lines
              paint(l)
       end
end

paint(l::Line) = nothing

I removed the println to better interpret benckmark comparation. So, in my main question, would be:

paint(l::L) where L <: LineAbstract = nothing

This version is more than 4 times faster according with BenchmarkTools package.

using BenchmarkTools

First version (main question)

@benchmark paint(p)

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     35.534 ns (0.00% GC)
  median time:      35.622 ns (0.00% GC)
  mean time:        36.694 ns (0.00% GC)
  maximum time:     117.992 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     992

Second (this) version

@benchmark paint(p)

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     8.288 ns (0.00% GC)
  median time:      8.598 ns (0.00% GC)
  mean time:        8.988 ns (0.00% GC)
  maximum time:     37.224 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     999

I holpe you have a nice/beautiful way to improve this perfomance. Thanks for you help :slight_smile:

You could try https://github.com/tkoolen/TypeSortedCollections.jl

2 Likes

If you only need a few types, you probably want something like

struct Picture
   lines :: Vector{Union{LineA,LineB}}
end

still that can come with performance problems, because if you define a method that acts on these types, and you dispatch on this vector at run time, the dynamic dispatch will have a performance penalty. If you have many types, probably you will need some of the ideas that were dicussed in these threads, because at the end they ended up in a discussion of exactly that problem:

A related follow up is here:

(and I wrote a simple example here, but I have yet to write it in detail: https://m3g.github.io/JuliaCookBook.jl/stable/splitting/

2 Likes

I played a little bit with your problem. Here I send you some codes. What it seems is that, despite a type instability being indicated, that has no effect on running times, for the operations that I perform in this test. That means that a MWE has to be taken with some care. Maybe you will find different answers in more realistic examples. For example, in the example above, because we had composed structures, the dynamic dispatch became much slower than the other options. In all cases it seems that the functor alternative is a good one.

Here goes the code:

Code
using BenchmarkTools

abstract type LineAbstract end

mutable struct LineA <: LineAbstract
      width::Float64
end

mutable struct LineB <: LineAbstract
      length::Float64
end

mutable struct Picture{T<:LineAbstract}
       lines::Vector{T}
end 

paint(l :: LineA) = l.width
paint(l :: LineB) = l.length

# Dynamical dispatch at runtime 

function paint(p :: Picture)
  s = 0.
  for l in p.lines
    s += paint(l)
  end
  s
end

# Union splitting 

function paint2(p::Picture)
  s = 0.
  for l in p.lines
    if l isa LineA
      s += paint(l)
    elseif l isa LineB
      s += paint(l)
    end
  end
  s
end

# Functors 

(line::LineA)() = line.width
(line::LineB)() = line.length

function paint3(p :: Picture)
  s = 0.
  for l in p.lines
    s += l()
  end
  s
end

# running

n = 1000
x = rand(n)
p = Picture([ rand(Bool) ? LineA(x[i]) : LineB(x[i]) for i in 1:n ]);

print(" with dynamic dispatch: "); @btime paint($p) 
print(" with splitting: "); @btime paint2($p) 
print(" with functors: "); @btime paint3($p)

# to compare with:

function naivesum(x)
  s = 0.
  for v in x
    s += v
  end
  s
end
print(" simple sum of an array of Float64 of same size: "); @btime paint3($p)

paint(p) ≈ paint2(p) ≈ paint3(p) ≈ naivesum(x)

Results:

julia> include("./paint.jl")
 with dynamic dispatch:   1.358 μs (0 allocations: 0 bytes)
 with splitting:   3.493 μs (0 allocations: 0 bytes)
 with functors:   1.438 μs (0 allocations: 0 bytes)
 simple sum of an array of Float64 of same size:   1.497 μs (0 allocations: 0 bytes)
true


This will allow Vectors of a single concrete subtype only (which will avoid dynamic dispatch), but won’t allow a Vector of mixed subtypes.

Try writing paint2 like this:

function paint2(p::Picture)
  s = 0.
  for l in p.lines
    if l isa LineA
      s += paint(l::LineA)
    else
      s += paint(l::LineB)
    end
  end
  s
end

Does that improve performance?

1 Like

That is one of the options of the code above. In this case actually it is worse, but that is not generalizable.

Edit: type annotations in the function calls? I didn’t knew that was even possible.

Hi @lmiq and @greg_plowman ,

I’m glad to see your answer. There is a little mistake at last line code of your test. It should be:

print(" simple sum of an array of Float64 of same size: "); @btime naivesum($x)

With naivesum function. The result time was:

  • with dynamic dispatch: 908.317 ns (0 allocations: 0 bytes)
  • with splitting: 2.849 μs (0 allocations: 0 bytes)
  • with functors: 1.155 μs (0 allocations: 0 bytes)
  • with cast: 799.685 ns (0 allocations: 0 bytes) —> with @greg_plowman propose
  • simple sum of an array of Float64 of same size: 769.153 ns (0 allocations: 0 bytes)

The detail are:

> @benchmark paint(p)

BenchmarkTools.Trial: 
  memory estimate:  16 bytes
  allocs estimate:  1
  --------------
  minimum time:     880.514 ns (0.00% GC)
  median time:      945.443 ns (0.00% GC)
  mean time:        963.119 ns (0.00% GC)
  maximum time:     4.288 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     35

> @benchmark paint2(p)

BenchmarkTools.Trial: 
  memory estimate:  16 bytes
  allocs estimate:  1
  --------------
  minimum time:     2.919 μs (0.00% GC)
  median time:      3.111 μs (0.00% GC)
  mean time:        3.140 μs (0.00% GC)
  maximum time:     9.822 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     8

> @benchmark paint3(p)

BenchmarkTools.Trial: 
  memory estimate:  16 bytes
  allocs estimate:  1
  --------------
  minimum time:     1.131 μs (0.00% GC)
  median time:      1.227 μs (0.00% GC)
  mean time:        1.245 μs (0.00% GC)
  maximum time:     5.595 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     10

> @benchmark paint4(p)

BenchmarkTools.Trial: 
  memory estimate:  16 bytes
  allocs estimate:  1
  --------------
  minimum time:     812.395 ns (0.00% GC)
  median time:      822.279 ns (0.00% GC)
  mean time:        834.154 ns (0.00% GC)
  maximum time:     1.867 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     86

> @benchmark naivesum(x)

BenchmarkTools.Trial: 
  memory estimate:  16 bytes
  allocs estimate:  1
  --------------
  minimum time:     798.979 ns (0.00% GC)
  median time:      801.149 ns (0.00% GC)
  mean time:        816.299 ns (0.00% GC)
  maximum time:     2.100 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     94

edit: paint4 is @greg_plowman propose

1 Like

We have to be careful to no be tricked by the simplicity of the test as well. If instead of computing the some of the elements, we compute the sum of the sin of the elements, the differences disappear (there is some fluctuation on the tests):

julia> include("./paint.jl")
 with runtime dispatch:   6.507 μs (0 allocations: 0 bytes)
 with splitting:   6.621 μs (0 allocations: 0 bytes)
 with functors:   6.060 μs (0 allocations: 0 bytes)
 simple sum of an array of Float64 of same size:   6.164 μs (0 allocations: 0 bytes)
true

Code
using BenchmarkTools

abstract type LineAbstract end

mutable struct LineA <: LineAbstract
      width::Float64
end

mutable struct LineB <: LineAbstract
      length::Float64
end

mutable struct Picture{T<:LineAbstract}
       lines::Vector{T}
end 

paint(l :: LineA) = l.width
paint(l :: LineB) = l.length

# Function to evaluate
f(x::Float64) = sin(x)

# Dynamical dispatch at runtime

function paint(p :: Picture)
  s = 0.
  for l in p.lines
    s += f(paint(l))
  end
  s
end

# Union splitting

function paint2(p::Picture)
  s = 0.
  for l in p.lines
    if l isa LineA
      s += f(paint(l))
    elseif l isa LineB
      s += f(paint(l))
    end
  end
  s
end

# Functors

(line::LineA)() = line.width
(line::LineB)() = line.length

function paint3(p :: Picture)
  s = 0.
  for l in p.lines
    s += f(l())
  end
  s
end

# running

n = 1000
x = rand(n)
p = Picture([ rand(Bool) ? LineA(x[i]) : LineB(x[i]) for i in 1:n ]);

print(" with runtime dispatch: "); @btime paint($p) 
print(" with splitting: "); @btime paint2($p) 
print(" with functors: "); @btime paint3($p)

# to compare with:

function naivesum(x)
  s = 0.
  for v in x
    s += f(v)
  end
  s
end
print(" simple sum of an array of Float64 of same size: "); @btime naivesum($x)

paint(p) ≈ paint2(p) ≈ paint3(p) ≈ naivesum(x)

I think the takeaway here is that the code is performant in all cases. But, as the other threads show, this may not be true if the array of mixed types contains elements with more sophisticated structures, with various fields. In that case it seems that the runtime dispatch becomes much less efficient than the other alternatives. This is what the benchmarks of this example show.

And, editing again…

If, in that example, one creates specialized versions for all types of the hit function, the differences are again only marginal:

for type in subtypes(Material)
  eval(:(hit(p::HitPoint{$type}) = p.r*p.p*p.m.m))
  eval(:((p::HitPoint{$type})() = p.r*p.p*p.m.m))
end

Thus, to get performance, one needs to create specialized methods for all types “manually”, and that fixed the runtime dispatch problem. That does not solve the type instability of the mixed-type vector in any case, of course, but that instability seems to be benign.

Indeed, with the parametric type the union does not work. Thanks. Fixed that.

I agree with you @lmiq, we should study a more realistic case. So I apply the sin function as you did and changed n = 100,000,000. Because with a little n there is a big influencial of processes at operation system.

My results was:

  • with dynamic dispatch: 1.120 s (0 allocations: 0 bytes)
  • with splitting: 1.121 s (0 allocations: 0 bytes)
  • with functors: 1.144 s (0 allocations: 0 bytes)
  • with cast: 1.159 s (0 allocations: 0 bytes)
  • simple sum of an array of Float64 of same size: 504.909 ms (0 allocations: 0 bytes)

true

I think, the difference must be related with getfield function, that is reasonable.

So, I think we can conclude that: although the warning with @code_warntype, the performance is good with dynamic dispatch.

1 Like

Just for curiosity, that difference remains if the structures are not mutable?

Benchmarks are probably incorrect, it should be @benchmark paint($p), not @benchmark paint(p)

Also it is possible that for some small union types compiler can apply some optimization. You should try version when number of subtypes is larger than 2, maybe 4 or 5.

It’s better to use immutable structures. To manipulate them you can use Setfield.jl.

And last thing, as it was discussed in other threads, if you can organize your data in a collection of homogenous vectors (i.e. LineA[] and LineB[]), than you will have no issues at all and your code will be more performant than any of the discussed solutions.

2 Likes

Indeed, things change with more types. I have updated the code to have 5 immutable types (the mutability didn’t make much of a difference). The results and the code are below. I was a little disappointed with the functor alternative not being performant, perhaps I am doing something wrong there?

julia> include("./paint.jl")
 with runtime dispatch:   45.081 ms (1000000 allocations: 15.26 MiB)
 with splitting:   17.535 ms (0 allocations: 0 bytes)
 with annotated splitting:   17.372 ms (0 allocations: 0 bytes)
 with functors:   39.943 ms (1000000 allocations: 15.26 MiB)
 simple sum of an array of Float64 of same size:   6.258 ms (0 allocations: 0 bytes)

Code
using BenchmarkTools

abstract type LineAbstract end

struct Line1 <: LineAbstract length::Float64 end
struct Line2 <: LineAbstract length::Float64 end
struct Line3 <: LineAbstract length::Float64 end
struct Line4 <: LineAbstract length::Float64 end
struct Line5 <: LineAbstract length::Float64 end

struct Picture{T<:LineAbstract}
       lines::Vector{T}
end 

paint(l :: Line1) = l.length
paint(l :: Line2) = l.length
paint(l :: Line3) = l.length
paint(l :: Line4) = l.length
paint(l :: Line5) = l.length

# Function to evaluate
f(x::Float64) = sin(x)

# Dynamical dispatch at runtime

function paint(p :: Picture)
  s = 0.
  for l in p.lines
    s += f(paint(l))
  end
  s
end

# Union splitting

function paint2(p::Picture)
  s = 0.
  for l in p.lines
    if l isa Line1
      s += f(paint(l))
    elseif l isa Line2
      s += f(paint(l))
    elseif l isa Line3
      s += f(paint(l))
    elseif l isa Line4
      s += f(paint(l))
    elseif l isa Line5
      s += f(paint(l))
    end
  end
  s
end

# Union splitting with annotated calls

function paint2_annotated(p::Picture)
  s = 0.
  for l in p.lines
    if l isa Line1
      s += f(paint(l::Line1))
    elseif l isa Line2
      s += f(paint(l::Line2))
    elseif l isa Line3
      s += f(paint(l::Line3))
    elseif l isa Line4
      s += f(paint(l::Line4))
    elseif l isa Line5
      s += f(paint(l::Line5))
    end
  end
  s
end

# Functors

(line::Line1)() = line.length
(line::Line2)() = line.length
(line::Line3)() = line.length
(line::Line4)() = line.length
(line::Line5)() = line.length

function paint3(p :: Picture)
  s = 0.
  for l in p.lines
    s += f(l())
  end
  s
end

# running

n = 1_000_000
x = rand(n)
line_types = [ Line1, Line2, Line3, Line4, Line5 ]
p = Picture([ line_types[rand(1:5)](x[i]) for i in 1:n ]);

print(" with runtime dispatch: "); @btime paint($p) 
print(" with splitting: "); @btime paint2($p) 
print(" with annotated splitting: "); @btime paint2_annotated($p) 
print(" with functors: "); @btime paint3($p)

# to compare with:

function naivesum(x)
  s = 0.
  for v in x
    s += f(v)
  end
  s
end
print(" simple sum of an array of Float64 of same size: "); @btime naivesum($x)

paint(p) ≈ paint2(p) ≈ paint2_annotated(p) ≈ paint3(p) ≈ naivesum(x)

Hello @Skoffer,

You are right, I repeated the benchmark with $ to avoid problems with global variables:

> @benchmark paint($p)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.131 s (0.00% GC)
  median time:      1.134 s (0.00% GC)
  mean time:        1.134 s (0.00% GC)
  maximum time:     1.137 s (0.00% GC)
  --------------
  samples:          5
  evals/sample:     1

> @benchmark paint2($p)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.132 s (0.00% GC)
  median time:      1.133 s (0.00% GC)
  mean time:        1.134 s (0.00% GC)
  maximum time:     1.140 s (0.00% GC)
  --------------
  samples:          5
  evals/sample:     1

> @benchmark paint3($p)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.155 s (0.00% GC)
  median time:      1.155 s (0.00% GC)
  mean time:        1.156 s (0.00% GC)
  maximum time:     1.159 s (0.00% GC)
  --------------
  samples:          5
  evals/sample:     1

> @benchmark paint4($p)

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.169 s (0.00% GC)
  median time:      1.170 s (0.00% GC)
  mean time:        1.170 s (0.00% GC)
  maximum time:     1.173 s (0.00% GC)
  --------------
  samples:          5
  evals/sample:     1

> @benchmark naivesum($x)

BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     493.815 ms (0.00% GC)
  median time:      495.045 ms (0.00% GC)
  mean time:        495.227 ms (0.00% GC)
  maximum time:     497.249 ms (0.00% GC)
  --------------
  samples:          11
  evals/sample:     1

Watching Leandro’s results with 5 types of lines and 1 milion of objects. I can see a big drawback:

  • dispatch: 7,2 times slower
  • splitting: 2,8 times slower
  • annotated splitting: 2,8 times slower
  • functors: 6,4 times slower

I think that this question is open and we no have an efficient solution at all. Do you agree guys?

I’m thinking if other languages like Java have the same problem. Because there we have many ways to solve this problem like interface.

@lmiq how you put your code in a expanding icon? By our name are you Brazilian?

Thanks guys.

I do not know Java, so I cannot tell. We are assuming here that the dynamical dispatch cannot be avoided by restructuring the data. From the other thread, it seems that C++ has some kind of solution to this, but I do not know which are the costs of these solutions in other language. It would be nice, I think, to build a comparison to see if there is really something that can be made to improve things in Julia, or if that is the cost of true runtime dynamical dispatch.

In Fortran, which is what I know, arriving to the problem is impossible, as you simply cannot build an array of mixed types (at least up to the point I know Fortran syntax, perhaps in more modern Fortran that is possible, but I don’t know). Whenever I faced something like that I simply rethought the problem to not fall into that problem.

(actually my name is spanish, but yes, I am Brazilian, I am a professor of chemistry at Unicamp).

For feed our discussion, I tried to reconstruct Leandro Martinez’s experiment with 5 lines types and n = 1,000,000 in Java language.

If I was unfair at some point, please tell me. The results were:

  • Naivesum Mean Duration: 15.65 ms
  • OO Mean Duration with objects: 19.513 ms

The Code

import java.util.List;
import java.util.ArrayList;
import java.util.Random;

abstract class LineAbstract {
    public abstract Float getLength();
    public abstract void setLength(Float l);
}

class Line1 extends LineAbstract { 
    private Float length; 
    public Float getLength() { return length; } 
    public void setLength(Float l){ this.length = l; } 
}
class Line2 extends LineAbstract { 
    private Float length; 
    public Float getLength() { return length; } 
    public void setLength(Float l){ this.length = l; }
}
class Line3 extends LineAbstract { 
    private Float length; 
    public Float getLength() { return length; } 
    public void setLength(Float l){ this.length = l; } 
}
class Line4 extends LineAbstract { 
    private Float length; 
    public Float getLength() { return length; } 
    public void setLength(Float l){ this.length = l; } 
}
class Line5 extends LineAbstract { 
    private Float length; 
    public Float getLength() { return length; } 
    public void setLength(Float l){ this.length = l; } 
}

class Picture {
    public List<LineAbstract> lines = new ArrayList<>();
    
    public Float f(Float v){ return (float) Math.sin(v); }
    
    public Float paint(){
        float s = 0F;
        for (LineAbstract l: lines){
            s += f(l.getLength());
        }        
        return s;
    }
}

public class Experiment {
    public static void main(String[] args) throws Exception {
        int n = 1000000;
        float[] x = new float[n];
        Random r = new Random();

        Class[] line_types = {Line1.class, Line2.class, Line3.class, Line4.class, Line5.class};
        Picture p = new Picture();

        for (int i = 0; i < n; i++){
            x[i] = (float) Math.random();
            Class clazz = line_types[ r.nextInt(5) ];
            LineAbstract la = (LineAbstract) clazz.newInstance();
            la.setLength(x[i]);
            p.lines.add( la );
        }

        int repetitions = 1000;
        long totalDuration = 0;

        float s1 = 0F;
        for (int i = 0; i < repetitions; i++){
            s1 = 0F;
            long startTime = System.currentTimeMillis();
    
            //Naivesum
            for (int j = 0; j < n; j++) 
                s1 += (float) Math.sin(x[j]);
    
            long endTime = System.currentTimeMillis();
            long duration = endTime - startTime;
            totalDuration += duration;
        }

        System.out.println("Naivesum Mean Duration: " + (float) (totalDuration) / repetitions + " ms");

        totalDuration = 0;

        float s2 = 0F;
        for (int i = 0; i < repetitions; i++){
            long startTime = System.currentTimeMillis();
    
            s2 = p.paint();
    
            long endTime = System.currentTimeMillis();
            long duration = endTime - startTime;
            totalDuration += duration;
        }

        System.out.println("OO Mean Duration with objects: " + (float) (totalDuration) / repetitions + " ms");
        
        Float tol = 0.001F;
        System.out.println("\ns1 = " + s1 + " s2 = " + s2);
        System.out.println("The results are the same: " + (Math.abs(s1 - s2) < tol) );
    }
}

I’m using Java 15.0.1.

So, In a simple sum Java was slower than Julia. Julia took 4.849 ms (at my PC) and Java took 15.65 ms. I wasn’t expecting this.
When we use splitting and annotated splitting Julia was faster than Java for a structured list, Julia took 12.604 and 12.501 ms (my PC) and Java took 19.513 ms.

Dynamic dispatch and functors were slower than Java, they took 30.833 and 31.213 ms.

Details results of Leandro Martinez’s expriments at my PC:

  • with runtime dispatch: 30.833 ms (1000000 allocations: 15.26 MiB)
  • with splitting: 12.604 ms (0 allocations: 0 bytes)
  • with annotated splitting: 12.501 ms (0 allocations: 0 bytes)
  • with functors: 31.213 ms (1000000 allocations: 15.26 MiB)

In a nutshell, if you are creating a new code. You can put a splitting or annotated splitting to get a good performance. But if you are reusing someone code, dynamic dispatch will be approximately 2,5 times slower compared with splitting.

(Legal Leandro conhecer outro brasileiro trabalhando com Julia. Sou professor de computação no IF Goiano Ceres)

1 Like