# What causes this performance loss when using value types?

The following are two implementations of the factorial function:

``````fact1(n) = fact1(Val(n))
fact1(::Val{n}) where {n} = n*fact1(Val(n-1))
fact1(::Val{1}) = 1
``````

and

``````fact2(n) = n==1 ? 1 : n*fact2(n-1)
``````

This may be subjective, but I find the style in “fact1” (Style 1) appealing. But, unfortunately, its performance is a bit poor. On my computer:

``````julia> VERSION
v"1.8.5"

julia> @time [fact1(n) for n = 1:20] ;
0.019617 seconds (44.86 k allocations: 2.370 MiB, 98.81% compilation time)

julia> @time [fact2(n) for n = 1:20] ;
0.014490 seconds (45.40 k allocations: 2.339 MiB, 98.81% compilation time)
``````

The timings are consistent over multiple runs.

I’d like to understand the reasons for the performance difference between these implementations. Is there a way to improve the performance of fact1 (Programming Style 1) so that the timing is comparable to that of Style 2?

PS: This question is not specific to the factorial function.

1 Like

The `fact1(n)` has to look up which `fact1(::Val{n})` to call. Since `n` is not known when `fact1(n)` is compiled, a lookup in the table of method instances for `fact1` must be done for every call to `fact1(n)`. This is so because the compiler only looks at the types of the arguments (an `Int` in this case), not the value. A call like `fact1(Val(5))` on the other hand, will compile recursively down to `fact1(Val(1))`, and figure out to just return `120`.:

``````@code_native fact1(Val(5))
...
mov     eax, 120
ret
...
``````

You can also see this problem with the suggestions in the performance section of the manual, i.e. a type instability. The compiler can’t figure out the type of the `Val(n)`, it’s `Val` without a parameter, i.e. an abstract type. And the return type is `Any`:

``````julia> @code_warntype fact1(4)
MethodInstance for fact1(::Int64)
from fact1(n) @ Main REPL:1
Arguments
#self#::Core.Const(fact1)
n::Int64
Body::Any
1 ─ %1 = Main.Val(n)::Val
│   %2 = Main.fact1(%1)::Any
└──      return %2

``````

Thank you @sgaure for the explanation and helpful hints. I’d like to add a few more points here.

In fact, when I tried benchmarking using `@btime`, which was recommended over `@time`, the gap was even wider than in my original post.

``````julia> VERSION
v"1.8.5"

julia> @btime [fact1(n) for n = 1:20] ;
3.999 μs (1 allocation: 224 bytes)

julia> @btime [fact2(n) for n = 1:20] ;
240.244 ns (1 allocation: 224 bytes)
``````

Next, I tried to address the `Any` return type issue mentioned in your response by explicitly providing a return type:

``````fact3(n) = fact3(Val(n))
(fact3(::Val{n})::Integer) where {n} = n*fact3(Val(n-1))
fact3(::Val{1}) = 1
``````

Which results in the forced return type being used:

``````julia> @code_warntype fact3(4)
MethodInstance for fact3(::Int64)
from fact3(n) in Main at REPL:1
Arguments
#self#::Core.Const(fact3)
n::Int64
Body::Integer
1 ─ %1 = Main.Val(n)::Val
│   %2 = Main.fact3(%1)::Integer
└──      return %2
``````

But the benchmark was similar to `fact1` or slower! Perhaps you meant `Val` to be the problem instead of the `Any` return type?

``````julia> @btime [fact3(n) for n = 1:20] ;
4.851 μs (9 allocations: 544 bytes)
``````

Nevertheless, for constants, as earlier, we have full folding:

``````julia> @code_warntype fact3(Val(4))
MethodInstance for fact3(::Val{4})
from fact3(::Val{n}) where n in Main at REPL:1
Static Parameters
n = 4
Arguments
#self#::Core.Const(fact3)
_::Core.Const(Val{4}())
Body::Int64
1 ─ %1 = Main.Integer::Core.Const(Integer)
│   %2 = \$(Expr(:static_parameter, 1))::Core.Const(4)
│   %3 = (\$(Expr(:static_parameter, 1)) - 1)::Core.Const(3)
│   %4 = Main.Val(%3)::Core.Const(Val{3}())
│   %5 = Main.fact3(%4)::Core.Const(6)
│   %6 = (%2 * %5)::Core.Const(24)
│   %7 = Base.convert(%1, %6)::Core.Const(24)
│   %8 = Core.typeassert(%7, %1)::Core.Const(24)
└──      return %8
``````
``````julia> @code_native fact3(Val(4))
[...]
movl    \$24, %eax
retq
[...]
``````

In conclusion, this style of programming not suitable for Julia? Or is there any way we can make Styles 1 and 2 have similar performance for small functions? Or should we use Style1, which uses inbuilt multiple-dispatch, only for heavy-weight functions?

Edit 1: Added the doubt about `Val` or `Any`.
The real problem is that `Val{3}` is not the same type as `Val{4}` (or any other integer). They are all different types. When you call `fact1(4)`, that method (`fact1(n)`), is compiled for `n` an `Int`. For performance it is important that a compiled function knows exactly what to do, including which instances of which methods to call. But, there is a catch, the compiler only takes into account the type of `n` (which is `Int`), not the value `4`. Every type of every variable in a function must be deducible from the input type(s). But the type of `Val(n)` cannot be deduced from the fact that `n` is an `Int`, since `Val(3)` has a different type than `Val(4)`.
So the compiler does not know whether `fact1(::Val{3})` or `fact1(::Val{4})` or any other instance of the method `fact1(::Val{n})` should be called. It must insert some decision code which is evaluated at runtime, a lookup in a table, based on the actual value of `n`. This lookup kills the performance.
The performance problem with your code is that it creates an object `Val(n)` which is of a type (`Val{n}`) which cannot be figured out from knowing only the type of `n`, it must be figured out at runtime when the value of `n` is known.