I am at a loss to explain the performance difference between these two functions. Can someone help?
julia> function foo(a::AbstractVector)
T = eltype(a)
c = Set{T}[Set{T}() for x in a]
return length(c)
end
foo (generic function with 1 method)
julia> function bar(a::Vector{T}) where T
c = Set{T}[Set{T}() for x in a]
return length(c)
end
bar (generic function with 1 method)
julia> a = rand(1:100_000, 2_000_000);
julia> eltype(a)
Int64
julia> @benchmark foo($a)
BenchmarkTools.Trial:
memory estimate: 961.30 MiB
allocs estimate: 10000004
--------------
minimum time: 3.654 s (14.18% GC)
median time: 3.797 s (17.07% GC)
mean time: 3.797 s (17.07% GC)
maximum time: 3.939 s (19.76% GC)
--------------
samples: 2
evals/sample: 1
julia> @benchmark bar($a)
BenchmarkTools.Trial:
memory estimate: 961.30 MiB
allocs estimate: 10000003
--------------
minimum time: 283.377 ms (0.00% GC)
median time: 983.086 ms (65.25% GC)
mean time: 1.080 s (68.63% GC)
maximum time: 2.720 s (87.05% GC)
--------------
samples: 6
evals/sample: 1
julia> foo(a) == bar(a)
true
Didn’t notice that although it’s actually still the same issue. The explicitly specified type hides the type instability on the final value but not in the loop.
And show the code_warntype of that. You can get a hint about it from ::Base.Generator{Array{Int64,1},getfield(Main, Symbol("##1#2")){DataType}} showing that the closure is only parametrized for {DataType} and not {Type{Int64}}.