Map vs Loops & Array Comprehensions in Julia 1.0

I’ve seen some older threads on this topic, but is it still the case that loops and array comprehensions are significantly faster than map() in Julia 1.0? If so, can someone provide intuition into why this is the case. Any examples of when we might want to use map() if array comprehensions are generally more efficient?

7 Likes

In general, map is as fast as handwritten loops these days: higher-order functions like this have been inlined and compiled to fast code since Julia 0.5. For example:

using BenchmarkTools

fmap(x) = map(x -> 2x, x)
fcomprehension(x) = [2x for x in x]
fdot(x) = 2 .* x
function floop(x)
    y = similar(x)
    for i in eachindex(x)
        y[i] = 2*x[i]
    end
    return y
end
function floopopt(x)
    y = similar(x)
    @simd for i in eachindex(x)
        @inbounds y[i] = 2*x[i]
    end
    return y
end

x = rand(1000)
@btime fmap($x)
@btime fcomprehension($x)
@btime fdot($x)
@btime floop($x)
@btime floopopt($x);

gives

  551.676 ns (2 allocations: 7.95 KiB)
  524.476 ns (2 allocations: 7.95 KiB)
  559.751 ns (1 allocation: 7.94 KiB)
  764.900 ns (1 allocation: 7.94 KiB)
  579.556 ns (1 allocation: 7.94 KiB)

with Julia 1.0 on my machine: the loops are actually slightly slower unless you use some tricks.

24 Likes

This was actually a really useful Q and A.

Really stoked to learn this! Thanks for the example and clarification.

The timings include the time spent for allocating the result and for computing it. I would suggest using a larger number of elements to focus on computations. For instance, starting with the same code as above by @stevengj but with:

x = rand(10000);

yields:

julia> @btime fmap($x);
  4.119 μs (2 allocations: 78.20 KiB)

julia> @btime fcomprehension($x);
  6.006 μs (2 allocations: 78.20 KiB)

julia> @btime fdot($x);
  6.160 μs (2 allocations: 78.20 KiB)

julia> @btime floop($x);
  7.613 μs (2 allocations: 78.20 KiB)

julia> @btime floopopt($x);
  6.298 μs (2 allocations: 78.20 KiB)

on my machine (AMD Ryzen Threadripper 2950X 16-Core Processor) with Julia 1.5 and -O3 optimization. This shows that map! is significantly faster on large arrays. With 1000 elements as in the original example, the timings are all very silmilar (between 830.247 ns for fcomprehension to 867.000 ns for floop).

2 Likes

For a

julia> versioninfo()
Julia Version 1.6.0-beta1.0
Commit b84990e1ac (2021-01-08 12:42 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz

I get

julia> x = rand(10_000);

julia> @btime fmap($x);
  5.380 μs (2 allocations: 78.20 KiB)

julia> @btime fcomprehension($x);
  5.380 μs (2 allocations: 78.20 KiB)

julia> @btime fdot($x);
  5.100 μs (2 allocations: 78.20 KiB)

julia> @btime floop($x);
  8.300 μs (2 allocations: 78.20 KiB)

julia> @btime floopopt($x);
  5.180 μs (2 allocations: 78.20 KiB)

in line with the results from 2 years ago.

2 Likes

Interesting, it looks like something strange is going on with something. On my laptop (Intel(R) Core™ i7-7700HQ CPU @ 2.80GHz) with Julia 1.7 I get (running consequently)

julia> @btime floopopt($x);
  7.161 μs (2 allocations: 78.20 KiB)

julia> @btime fmap($x);
  4.601 μs (2 allocations: 78.20 KiB)

julia> @btime floopopt($x);
  4.338 μs (2 allocations: 78.20 KiB)

julia> @btime fmap($x);
  6.514 μs (2 allocations: 78.20 KiB)

So, for some reason timing is very unstable.

On my machine with Julia 1.6 the -O3 makes a big difference for the fmap time.

1 Like

Hm, doesn’t seem to matter for me - the above was without flags (which I think is O2?), I had absorbed somewhere on here that O3 basically doesn’t actually do any worthwhile optimizations anymore:

C:\Users\ngudat>julia -O3

(...)

julia> x = rand(10_000);

julia> @btime fmap($x);
  5.980 μs (2 allocations: 78.20 KiB)

julia> @btime fcomprehension($x);
  6.050 μs (2 allocations: 78.20 KiB)

julia> @btime fdot($x);
  5.433 μs (2 allocations: 78.20 KiB)

julia> @btime floop($x);
  8.100 μs (2 allocations: 78.20 KiB)

julia> @btime floopopt($x);
  5.450 μs (2 allocations: 78.20 KiB)

These large variations in the timings of @Skoffer are surprising. Perhaps there were some other heavy tasks running?

Otherwise, from all timings, it seems that fmap is among the fastest and yet very simple (just a call to map). This is something I definitively like with Julia: simple things oftenly turn out to be the most efficient.

1 Like

MacBook Air M1 Julia 1.7.0 compiled from source running native in Arm for x = 1_000:

julia> @btime fmap($x);
  1.408 μs (2 allocations: 78.20 KiB)

julia> @btime fcomprehension($x);
  1.417 μs (2 allocations: 78.20 KiB)

julia> @btime fdot($x);
  1.204 μs (2 allocations: 78.20 KiB)

julia> @btime floop($x);
  5.132 μs (2 allocations: 78.20 KiB)

julia> @btime floopopt($x);
  1.208 μs (2 allocations: 78.20 KiB)
1 Like