But with this reasoning there should be no performance difference in this case, no? After all, the function is just passed on to `map`

. I think I don’t understand what

actually means.

`@descend`

also seems to produce identical typed code for both version, and the types seem to be specialized in both:

##
Chulhu output

original `process_matrix`

:

```
julia> @descend process_matrix(_f, matrix, vectors)
process_matrix(f, matrix, vectors) in Main at REPL[10]:1
1 function process_matrix(f::typeof(_f), matrix::Array{ComplexF64, 3}, vectors::Vector{Vector{Float64}})::Matrix{ComplexF64}
2 N::Int64, M::Int64, L::Int64 = size(matrix::Array{ComplexF64, 3})::Tuple{Int64, Int64}::Int64
3 coefficients::Vector{Matrix{Float64}} = map(f::typeof(_f), vectors::Vector{Vector{Float64}})::Vector{Matrix{Float64}}
4 #coefficients = [ones(Float64, 4, 4) * norm(v) for v in vectors]
5 ret::Matrix{ComplexF64} = zeros(ComplexF64::Type{ComplexF64}, L::Int64, L::Int64)::Matrix{ComplexF64}
6 for i::Int64 in 1:L, j in 1:L
7 for k::Int64 in (1:N::Int64)::Int64::Union{Nothing, Tuple{Int64, Int64}}
8 for a::Int64 in 1:M, b in 1:M
9 ret::Matrix{ComplexF64}[i::Int64, j::Int64] = (coefficients::Vector{Matrix{Float64}}[k::Int64]::Matrix{Float64}[a, b::Int64]::Float64 * matrix::Array{ComplexF64, 3}[k::Int64, a::Int64, i::Int64]::ComplexF64 * matrix::Array{ComplexF64, 3}[k::Int64, b::Int64, j::Int64]::ComplexF64)::ComplexF64
10 end
11 end
12 end
13
14 return ret::Matrix{ComplexF64}
15 end
Select a call to descend into or ↩ to ascend. [q]uit. [b]ookmark.
Toggles: [w]arn, [h]ide type-stable statements, [t]ype annotations, [s]yntax highlight for Source/LLVM/Native.
Show: [S]ource code, [A]ST, [T]yped code, [L]LVM IR, [N]ative code
Actions: [E]dit source code, [R]evise and redisplay
```

`process_matrix2`

with forced specialization:

```
julia> @descend process_matrix2(_f, matrix, vectors)
process_matrix2(f::F, matrix, vectors) where F<:Function in Main at REPL[13]:1
1 function (process_matrix2(f::typeof(_f)::F, matrix::Array{ComplexF64, 3}, vectors::Vector{Vector{Float64}}) where F<:Function)::Matrix{ComplexF64}
2 N::Int64, M::Int64, L::Int64 = size(matrix::Array{ComplexF64, 3})::Tuple{Int64, Int64}::Int64
3 coefficients::Vector{Matrix{Float64}} = map(f::typeof(_f), vectors::Vector{Vector{Float64}})::Vector{Matrix{Float64}}
4 #coefficients = [ones(Float64, 4, 4) * norm(v) for v in vectors]
5 ret::Matrix{ComplexF64} = zeros(ComplexF64::Type{ComplexF64}, L::Int64, L::Int64)::Matrix{ComplexF64}
6 for i::Int64 in 1:L, j in 1:L
7 for k::Int64 in (1:N::Int64)::Int64::Union{Nothing, Tuple{Int64, Int64}}
8 for a::Int64 in 1:M, b in 1:M
9 ret::Matrix{ComplexF64}[i::Int64, j::Int64] = (coefficients::Vector{Matrix{Float64}}[k::Int64]::Matrix{Float64}[a, b::Int64]::Float64 * matrix::Array{ComplexF64, 3}[k::Int64, a::Int64, i::Int64]::ComplexF64 * matrix::Array{ComplexF64, 3}[k::Int64, b::Int64, j::Int64]::ComplexF64)::ComplexF64
10 end
11 end
12 end
13
14 return ret::Matrix{ComplexF64}
15 end
Select a call to descend into or ↩ to ascend. [q]uit. [b]ookmark.
Toggles: [w]arn, [h]ide type-stable statements, [t]ype annotations, [s]yntax highlight for Source/LLVM/Native.
Show: [S]ource code, [A]ST, [T]yped code, [L]LVM IR, [N]ative code
Actions: [E]dit source code, [R]evise and redisplay
```

I tried to reduce the MWE a bit further to figure out what’s going on, and it looks like the problem with the non-specialized version is that the output type of `map(f, vectors)`

cannot be inferred and hence every access to `coefficients`

creates extra allocations (which makes sense to me).

##
Smaller MWE

Note that the non-annotated version has ~ 60 x 60 = 3600 allocations more than the annotated one, which is the number of `getindex`

calls for `coefficients`

:

```
using LinearAlgebra
using BenchmarkTools
vectors = [rand(2) for _ in 1:60]
function _f(v)
return ones(Float64, 4, 4) * norm(v)
end
function process_matrix_redux(f, vectors)
coefficients = map(f, vectors)
N = length(vectors)
ret = zeros(N, N)
for i in 1:N, j in 1:N
ret[i, j] = first(coefficients[i])
end
return ret
end
# 446.167 μs (3724 allocations: 107.47 KiB)
function process_matrix_redux_annotated(f::F, vectors) where F
coefficients = map(f, vectors)
N = length(vectors)
ret = zeros(N, N)
for i in 1:N, j in 1:N
ret[i, j] = first(coefficients[i])
end
return ret
end
# 10.171 μs (123 allocations: 51.20 KiB)
```

Another minor weirdness: The docs mention `(@which process_matrix_redux(_f, vectors)).specializations`

, which does in fact show that that there is only a non-specialized version:

```
# Right after defining everything
julia> (@which process_matrix_redux(_f, vectors)).specializations
svec(MethodInstance for process_matrix_redux(::Function, ::Vector{Vector{Float64}}), nothing, nothing, nothing, nothing, nothing, nothing, nothing)
```

however, it only does so when called *before* calling the benchmark code (I get why this also happens when calling `@code_warntype`

, since it apparently triggers a new specialization, but why does it happen with `@btime`

? )

```
# After doing `@btime process_matrix_redux($_f, $vectors)`
julia> (@which process_matrix_redux(_f, vectors)).specializations
svec(MethodInstance for process_matrix_redux(::Function, ::Vector{Vector{Float64}}), MethodInstance for process_matrix_redux(::typeof(_f), ::Vector{Vector{Float64}}), nothing, nothing, nothing, nothing, nothing, nothing)
```

Now there is seemingly a new specialization `MethodInstance for process_matrix_redux(::typeof(_f), ::Vector{Vector{Float64}})`

, but the code that is actually run is still the non-specialized one.