10 times more compilation time than expected

Brian1 · September 30, 2022, 1:41am

There is my old version codes:

struct A
    a::Vector{Float64}
    n::Int
end

test_func(a::A) = begin
    a_vec,a_n = a.a,a.n
    for i = 1 : size(a_vec,1)
        a_vec[i] + a_n
    end
end;

a1 = A(rand(1000_000),1)
a2 = A(rand(1000_000),2)
a3 = A(rand(1000_000),3);
a4 = A(rand(1000_000),4);
a5 = A(rand(1000_000),5);

# run it on a fresh session to find out the compilation time
@time begin
    test_func(a1)
    test_func(a2)
    test_func(a3)
    test_func(a4)
    test_func(a5)
end;

0.002573 seconds

The there is my new version codes:

abstract type A end

struct A1 <: A
    a::Vector{Float64}
end
struct A2 <: A
    a::Vector{Float64}
end
struct A3 <: A
    a::Vector{Float64}
end
struct A4 <: A
    a::Vector{Float64}
end
struct A5 <: A
    a::Vector{Float64}
end

test_func(a::T) where {T<:A1} = begin
    for i = 1 : size(a.a,1)
        a.a[i] + 1
    end
end
test_func(a::T) where {T<:A2} = begin
    for i = 1 : size(a.a,1)
        a.a[i] + 1
    end
end
test_func(a::T) where {T<:A3} = begin
    for i = 1 : size(a.a,1)
        a.a[i] + 1
    end
end
test_func(a::T) where {T<:A4} = begin
    for i = 1 : size(a.a,1)
        a.a[i] + 1
    end
end
test_func(a::T) where {T<:A5} = begin
    for i = 1 : size(a.a,1)
        a.a[i] + 1
    end
end

a1 = A1(rand(1000_000))
a2 = A2(rand(1000_000))
a3 = A3(rand(1000_000));
a4 = A4(rand(1000_000));
a5 = A5(rand(1000_000));


# run it on a fresh session to find out the compilation time
@time begin
    test_func(a1)
    test_func(a2)
    test_func(a3)
    test_func(a4)
    test_func(a5)
end;

 0.020021 seconds (100.72 k allocations: 5.585 MiB, 99.78% compilation time)

As can be seen above, the new version code takes more than 10 times time to compile.
In my thought, the new code is more extentable than the old. So, how can I keep the style of new code and reduce the compilation time to the old?

Benny · September 30, 2022, 1:52am

The new version has 5 concrete types compared to the old version’s 1 concrete type. In your old version, only test_func(a1) required compilation, the other calls just used the existing compiled method test_func(::A). The new version had to compile test_func(::A1), test_func(::A2), test_func(::A3), test_func(::A4), test_func(::A5). I’m not exactly sure why it’s 0.020021/0.002573 = 7.781 times instead of 5, but there are many sources of timing variance that can skew 1-@time measurements. A couple examples off the top of my head: heap allocations are nondeterministic, and a multitasking operating system can interrupt the program for quick important background processes.

mkitti · September 30, 2022, 2:25am

There are also five constructors that need to be compiled.

Also why do we need where statements with these concrete types? Could we just do the following?

test_func(a::A1) = begin
    for i = 1 : size(a.a,1)
        a.a[i] + 1
    end
end

Also I’m not sure this is the best example since the bodies of these functions get compiled away.

Benny · September 30, 2022, 2:30am

I believe the constructors (A, A1, A2, A3, A4, A5) are all compiled upon their calls prior to the timing, though, so their compilation isn’t being counted.

Brian1 · September 30, 2022, 4:26am

The difference between 7.1 and 5 might just be caused by random pulse from system.
That is to say:

abstract type A end
struct A1 <: A
    a::Vector{Float64}
end
struct A2 <: A
    a::Vector{Float64}
end
.
.
.

Is there a way I can inform the compiler that test_func is all the same for all A, so that after compiling for test_func(x::A1), then test_func(x::A2) can use the compile result of test_func(x::A1) ?

jling · September 30, 2022, 5:31am

no because they are different types and different functions, don’t make them different concrete types if you don’t want them to be different, use NamedTuple or whatever, which is “anonymous” in the sense that their type is defined by field names and field types

what are you trying to achieve here? so far the code snippet is equivalent to if you never defined A2 - A5 as separate types and instead just do

const A2 = const A3 = const A4 = const A5 = A1

Benny · September 30, 2022, 6:48am

Well, if OP really wants to dispatch to different methods with different baked-in constants, OP must compile each of those methods and use different concrete input types; making A2-A5 aliases for A1 will cause those definitions to override each other so test_func only has 1 method. For such simple constants, I’d rather refactor those types as members of a parametric struct struct A{N} a::Vector{Float64} end so I can just write the one parametric method function test_funct(a::A{N}) where N, but that method will still be compiled separately for each concrete type e.g. A{1}, A{2}.

There is just a tradeoff between putting work in runtime vs compile-time (dispatching on more types and compiling more methods). In OP’s example, there doesn’t seem to be much benefit to moving n from runtime to compile-time, but there might be more complicated cases where the approach is worth it.

Brian1 · October 4, 2022, 9:56am

I find another way to do it:

Old codes:

abstract type A end

struct A1 <: A
    a::Vector{Float64}
end

struct A2 <: A
    a::Vector{Float64}
end

struct A3 <: A
    a::Vector{Float64}
end

struct A4 <: A
    a::Vector{Float64}
end

struct A5 <: A
    a::Vector{Float64}
end

struct A6 <: A
    a::Vector{Float64}
end

test_func(x::T) where {T<:A} = begin
    res = similar(x.a)
    for i = 1 : size(x.a,1)
        res[i] = x.a[i] + 1
    end
    return res
end

a1 = A1(randn(100_000))
a2 = A2(randn(100_000))
a3 = A3(randn(100_000))
a4 = A4(randn(100_000))
a5 = A5(randn(100_000))
a6 = A6(randn(100_000));

@timev begin
    test_func(a1)
    test_func(a2)
    test_func(a3)
    test_func(a4)
    test_func(a5)
    test_func(a6)
end;
 0.059963 seconds (127.07 k allocations: 11.560 MiB, 96.75% compilation time)
elapsed time (ns):  59962600
gc time (ns):       0
bytes allocated:    12121391
pool allocs:        127066
non-pool GC allocs: 0
malloc() calls:     6
free() calls:       0
minor collections:  0
full collections:   0

New codes:

abstract type A end

struct A1 <: A
    a::Vector{Float64}
end

struct A2 <: A
    a::Vector{Float64}
end

struct A3 <: A
    a::Vector{Float64}
end

struct A4 <: A
    a::Vector{Float64}
end

struct A5 <: A
    a::Vector{Float64}
end

struct A6 <: A
    a::Vector{Float64}
end

struct Wrapper
    a::Vector{Float64}
    n::Int
end

Wrapper(x::A) = Wrapper(x.a,1)

test_func(x::T) where {T<:Wrapper} = begin
    res = similar(x.a)
    for i = 1 : size(x.a,1)
        res[i] = x.a[i] + x.n
    end
    return res
end

a1 = A1(randn(100_000))
a2 = A2(randn(100_000))
a3 = A3(randn(100_000))
a4 = A4(randn(100_000))
a5 = A5(randn(100_000))
a6 = A6(randn(100_000));

@timev begin
    a1 = Wrapper(a1)
    a2 = Wrapper(a2)
    a3 = Wrapper(a3)
    a4 = Wrapper(a4)
    a5 = Wrapper(a5)
    a6 = Wrapper(a6)

    test_func(a1)
    test_func(a2)
    test_func(a3)
    test_func(a4)
    test_func(a5)
    test_func(a6)
end;
  0.021478 seconds (31.44 k allocations: 6.388 MiB, 93.96% compilation time)
elapsed time (ns):  21478000
gc time (ns):       0
bytes allocated:    6698444
pool allocs:        31438
non-pool GC allocs: 0
malloc() calls:     6
free() calls:       0
minor collections:  0
full collections:   0

As can be seen above, the new code is 3x faster than the old.
The old codes compile test_func on A1,A2,A3,A4,A5,A6, 6 times. The new codes complie Wrapper on A1,A2,A3,A4,A5,A6 and test_func on Wrapper, all sum to 7 times. But the time of compiling on Wrapper is more easier than test_func. In my real situation, test_func is a bit more complicated, so the gap in speed between old codes and new codes can be up to 12x.

In the aspect of extention, if I have a subtype of A: A7, which should be designed to plus 2.
In the old codes , I would define:

struct A7 <: A
    a::Vector{Float64}
end

test_func(x::T) where {T<:A7} = begin
    res = similar(x.a)
    for i = 1 : size(x.a,1)
        res[i] = x.a[i] + 2
    end
    return res
end

In the new codes:

struct A7 <: A
    a::Vector{Float64}
end

Wrapper(x::A7) = Wrapper(x.a,2)

In my guess, If the compilation can compile in abstract type, the compilation time on the first running would be reduced much?

Brian1 · October 4, 2022, 10:00am

I am trying to keep the ability of extension and reduce the first running time.

StefanKarpinski · October 4, 2022, 1:14pm

Very hard to say given the highly limited context provided here (can you explain what you’re actually trying to do?), but it sounds a lot like the Wrapper approach is better anyway since adding some offset value is probably not a good use of the type system.

Benny · October 4, 2022, 5:38pm

Um, I just noticed that in both those versions, you’re just adding 1 to x.a[i]. The Wrapper version adds x.n, sure, but you only defined Wrapper(x::A) = Wrapper(x.a,1) so x.n is 1 in all the compiled specializations for all subtypes of A. In your original struct A code you store a_n with different integers, not just 1.

Topic		Replies	Views
Significant decrease in performance after seemingly irrelevant changes Performance	16	1188	June 29, 2019
Dramatic performance change by adding additional unused method Performance	33	1942	November 23, 2021
Sections of strings or arrays and concrete types Performance	9	416	January 8, 2021
Performance problems when dealing with deliberately type-unstable code. How to use type knowledge better? Performance	33	2073	December 30, 2020
Tutorial on using advanced type system in Julia? New to Julia	28	3090	January 27, 2020

10 times more compilation time than expected

Related topics