What causes this performance loss when using value types?

The following are two implementations of the factorial function:

fact1(n) = fact1(Val(n))
fact1(::Val{n}) where {n} = n*fact1(Val(n-1))
fact1(::Val{1}) = 1


fact2(n) = n==1 ? 1 : n*fact2(n-1)

This may be subjective, but I find the style in β€œfact1” (Style 1) appealing. But, unfortunately, its performance is a bit poor. On my computer:

julia> VERSION

julia> @time [fact1(n) for n = 1:20] ;
  0.019617 seconds (44.86 k allocations: 2.370 MiB, 98.81% compilation time)

julia> @time [fact2(n) for n = 1:20] ;
  0.014490 seconds (45.40 k allocations: 2.339 MiB, 98.81% compilation time)

The timings are consistent over multiple runs.

I’d like to understand the reasons for the performance difference between these implementations. Is there a way to improve the performance of fact1 (Programming Style 1) so that the timing is comparable to that of Style 2?

PS: This question is not specific to the factorial function.

1 Like

The fact1(n) has to look up which fact1(::Val{n}) to call. Since n is not known when fact1(n) is compiled, a lookup in the table of method instances for fact1 must be done for every call to fact1(n). This is so because the compiler only looks at the types of the arguments (an Int in this case), not the value. A call like fact1(Val(5)) on the other hand, will compile recursively down to fact1(Val(1)), and figure out to just return 120.:

@code_native fact1(Val(5))
        mov     eax, 120

You can also see this problem with the suggestions in the performance section of the manual, i.e. a type instability. The compiler can’t figure out the type of the Val(n), it’s Val without a parameter, i.e. an abstract type. And the return type is Any:

julia> @code_warntype fact1(4)
MethodInstance for fact1(::Int64)
  from fact1(n) @ Main REPL[2]:1
1 ─ %1 = Main.Val(n)::Val
β”‚   %2 = Main.fact1(%1)::Any
└──      return %2

Thank you @sgaure for the explanation and helpful hints. I’d like to add a few more points here.

In fact, when I tried benchmarking using @btime, which was recommended over @time, the gap was even wider than in my original post.

julia> VERSION

julia> @btime [fact1(n) for n = 1:20] ;
  3.999 ΞΌs (1 allocation: 224 bytes)

julia> @btime [fact2(n) for n = 1:20] ;
  240.244 ns (1 allocation: 224 bytes)

Next, I tried to address the Any return type issue mentioned in your response by explicitly providing a return type:

fact3(n) = fact3(Val(n))
(fact3(::Val{n})::Integer) where {n} = n*fact3(Val(n-1))
fact3(::Val{1}) = 1

Which results in the forced return type being used:

julia> @code_warntype fact3(4)
MethodInstance for fact3(::Int64)
  from fact3(n) in Main at REPL[10]:1
1 ─ %1 = Main.Val(n)::Val
β”‚   %2 = Main.fact3(%1)::Integer
└──      return %2

But the benchmark was similar to fact1 or slower! Perhaps you meant Val to be the problem instead of the Any return type?

julia> @btime [fact3(n) for n = 1:20] ;
  4.851 ΞΌs (9 allocations: 544 bytes)

Nevertheless, for constants, as earlier, we have full folding:

julia> @code_warntype fact3(Val(4))
MethodInstance for fact3(::Val{4})
  from fact3(::Val{n}) where n in Main at REPL[11]:1
Static Parameters
  n = 4
1 ─ %1 = Main.Integer::Core.Const(Integer)
β”‚   %2 = $(Expr(:static_parameter, 1))::Core.Const(4)
β”‚   %3 = ($(Expr(:static_parameter, 1)) - 1)::Core.Const(3)
β”‚   %4 = Main.Val(%3)::Core.Const(Val{3}())
β”‚   %5 = Main.fact3(%4)::Core.Const(6)
β”‚   %6 = (%2 * %5)::Core.Const(24)
β”‚   %7 = Base.convert(%1, %6)::Core.Const(24)
β”‚   %8 = Core.typeassert(%7, %1)::Core.Const(24)
└──      return %8
julia> @code_native fact3(Val(4))
        movl    $24, %eax

In conclusion, this style of programming not suitable for Julia? Or is there any way we can make Styles 1 and 2 have similar performance for small functions? Or should we use Style1, which uses inbuilt multiple-dispatch, only for heavy-weight functions?

Edit 1: Added the doubt about Val or Any.
Edit 2: Added missing word.

Annotating the function with a return type only adds a conversion prior to every return in the function. That is not the performance problem, merely a symptom of something else.

The real problem is that Val{3} is not the same type as Val{4} (or any other integer). They are all different types. When you call fact1(4), that method (fact1(n)), is compiled for n an Int. For performance it is important that a compiled function knows exactly what to do, including which instances of which methods to call. But, there is a catch, the compiler only takes into account the type of n (which is Int), not the value 4. Every type of every variable in a function must be deducible from the input type(s). But the type of Val(n) cannot be deduced from the fact that n is an Int, since Val(3) has a different type than Val(4).

So the compiler does not know whether fact1(::Val{3}) or fact1(::Val{4}) or any other instance of the method fact1(::Val{n}) should be called. It must insert some decision code which is evaluated at runtime, a lookup in a table, based on the actual value of n. This lookup kills the performance.

The performance problem with your code is that it creates an object Val(n) which is of a type (Val{n}) which cannot be figured out from knowing only the type of n, it must be figured out at runtime when the value of n is known.

1 Like