ComposedFunction’s call aggressively @inlines its contained functions’ calls, so with type stability, all of it is eliminated.
julia> @code_llvm identity(Val(1))
; Function Signature: identity(Base.Val{1})
; @ operators.jl:531 within `identity`
; Function Attrs: uwtable
define void @julia_identity_4722() #0 {
top:
ret void
}
julia> @code_llvm (foo2∘foo1)(Val(1))
; Function Signature: (::Base.ComposedFunction{typeof(Main.foo2), typeof(Main.foo1)})(Base.Val{1})
; @ operators.jl:1050 within `ComposedFunction`
; Function Attrs: uwtable
define void @julia_ComposedFunction_4719() #0 {
top:
ret void
}
But you don’t have type stability with a Val-inferred input for the call, so compiler optimizations get iffier. For identity, the compiler is able to recognize the sole method and inline the code, so there isn’t even anything really happening in the loop. The loop probably only exists because there’s a possibility of an undefined reference, and it’s only semantically correct to hit that error. A foo1 call alone gets ignored, so it performs the same.
When the foo2 call gets involved, the calls don’t get entirely ignored anymore and we get varying degrees of runtime dispatch (@ijl_apply_generic) and associated allocations (which can be 0 if inputs and outputs are already boxed). foo1’s runtime dispatch is apparent, but even foo2 may also need runtime dispatch if it’s not ignored and N is inferred as Any.
It’s possible that with such an additional way to construct parametric types Val{N} where N isa Int and a likely manual specification of such an element type, type inference improves and aggressive inlining can do away with runtime dispatches. However, that’s a fairly brittle situation (the real versions of foo1 and foo2 may not be small enough even with @inline), and arguably the 1000 N::Int instances here shouldn’t be stored in the type domain. Types tell code how to handle their instances, so making thousands of something to do the same thing really suggests 1 type with thousands of instances, and the other way around will have significant optimization difficulties. More specifically, Val instances are intended for limited static information that can take up no space (sizeof(eltype(a)) == 0), but the abstract element type forces a to store a pointer for each element, which saved no space (sizeof(Ptr) == sizeof(Int)) at the cost of type stability and data locality. Sometimes inhomogeneous type collections are unavoidable, but it may also be possible to convert your types to 1 shared type in an isolated type-unstable step to optimize a processing step that handles them all the same way.