Why is identity on an Any vector so much slower when broadcasting than when mapped?

In the REPL I was experimenting with a function that returns Vector{Any}, and I wanted to see how I could make Julia “automatically” change the type to a fixed type without having to specify it myself. I noticed that even though the end result is the same, the performance can vary widely depending on how I do it.

I found a simple example that shows an extreme difference: (I’ll suppress the outputs of the vectors because they’re long)

julia> s = Any[i for i in 1:10000]

# 3 different ways to do it:
julia> [i for i in s] == map(identity, s) == identity.(s)
#Now to time them:
julia> using BenchmarkTools

julia> @btime [i for i in s]
  5.217 μs (5 allocations: 78.22 KiB)
10000-element Vector{Int64}:

julia> @btime map(identity, s)
  5.493 μs (6 allocations: 78.23 KiB)
10000-element Vector{Int64}:

julia> @btime identity.(s)
  107.125 μs (9502 allocations: 226.83 KiB)
10000-element Vector{Int64}:

Woah. The listcomp and the map are similar, but the broadcast is 19.5 times slower than the map.

Why is the broadcast so much slower here? (If it helps, I’m using Julia 1.8.5.)

1 Like

Just to confuse you a little more, on julia 1.9 on my laptop:

julia> @btime [i for i in s];
  8.824 μs (5 allocations: 78.22 KiB)

julia> @btime map(identity, s);
  12.602 μs (21 allocations: 78.94 KiB)

julia> @btime identity.(s);
  10.903 μs (13 allocations: 78.56 KiB)
1 Like

Can’t time it right now but there is also an option along those lines

function narrow_type(A::AbstractArray)
    isconcretetype(eltype(A)) && return A
    elt = mapreduce(typeof, promote_type, A)
    convert.(elt, A)

Ah, so it seems to be fixed in 1.9 then - that’s good news! The differing number of allocations between the 3 methods is interesting though, wonder what causes that. And also it seems that listcomps still reign supreme for now.

Yeah, and map seems to be relatively slower?

It used to be that one couldn’t trust the allocation estimates without using variable interpolation. Is that improved?

Seems that not interpolating just adds 1 allocation for the comprehension and map, and but 2 for broadcasts.

julia> @btime [i for i in $s];
  8.197 μs (4 allocations: 78.20 KiB)

julia> @btime map(identity, $s);
  10.869 μs (20 allocations: 78.92 KiB)

julia> @btime identity.($s);
  10.891 μs (11 allocations: 78.53 KiB)

And the time differnce with map disappears too.


Interesting, thanks - learning about some new functions here (hadn’t heard of isconcretetype before). I tried this out, and it works, but unfortunately it’s slower than the broadcast:

julia> @btime narrow_type(s)
  1.528 ms (9502 allocations: 226.84 KiB)
10000-element Vector{Int64}: