That’s very interesting, thanks!
Just to add my takeaway from this (very enlightening!) discussion: I too think the current behavior of the (non-)specialization is a bit too tricky in this kind of situations. But reading through the explanations so far, it seems like it’s impossible to choose a default behavior that catches all performance pitfalls
- either we specialize on function arguments per default, and provide a way to disable it (perhaps with a different macro than
@nospecialize
, but with similar effect), but then someone writing code using a lot of closures and user-defined functions might run into performance issues - or we don’t specialize by default (current behavior), and provide a way to force it, but then a case like the original post in this thread may happen, where the result of a non-inferred map call is not behind a function barrier
Both scenarios require the author of the code to be aware of the issue and choose the appropriate strategy. It still feels a bit like opting out of too much specialization would be more desirable than opting in to more specialization, but in the end it’s a bit of a coin toss…
And lastly, I feel like a better fix to the original problem would be to add a function barrier, as @Benny mentioned in passing:
function kernel_operation!(ret, coefficients, matrix)
N, M, L = size(matrix)
for i in 1:L, j in 1:L
for k in 1:N
for a in 1:M, b in 1:M
ret[i, j] = coefficients[k][a, b] * matrix[k, a, i] * matrix[k, b, j]
end
end
end
end
function process_matrix_with_barrier(f, matrix, vectors)
N, M, L = size(matrix)
coefficients = map(f, vectors)
ret = zeros(ComplexF64, L, L)
kernel_operation!(ret, coefficients, matrix)
return ret
end
This achieves the same performance as the specialization of the argument and leads to (in my opinion) more readable code. Just because the underlying issue is a bit obscure, doesn’t mean the solution has to be