For-loop vs list-comprehension

If we create foo4 as

julia> function foo4(x,y)
           collect( y[x] for x in transcode(UInt8, x) )
       end

which has identical lowered code as foo1:

julia> @code_lowered foo4(x,y)
CodeInfo(
1 ─ %1 = Main.:(var"#27#28")
│   %2 = Core.typeof(y)
│   %3 = Core.apply_type(%1, %2)
│        #27 = %new(%3, y)
│   %5 = #27
│   %6 = Main.transcode(Main.UInt8, x)
│   %7 = Base.Generator(%5, %6)
│   %8 = Main.collect(%7)
└──      return %8
)

I get in 1.5.3:

julia> @btime foo1(x,y);
  80.580 ms (2 allocations: 95.37 MiB)

julia> @btime foo4(x,y);
  92.395 ms (2 allocations: 95.37 MiB)

which is quite astounding.
In 1.6.0 the timings are nearly identical.

I have installed 1.5.4 now and I get:

julia> @btime foo1(x,y);
  80.843 ms (2 allocations: 95.37 MiB)

julia> @btime foo2(x,y);
  81.314 ms (2 allocations: 95.37 MiB)

julia> @btime foo3(x,y);
  81.172 ms (2 allocations: 95.37 MiB)

julia> @btime foo4(x,y);
  93.333 ms (2 allocations: 95.37 MiB)

which doesn’t reproduce your timings, so my guess is, that there are differences in the hardware (CPU).

I don’t want to investigate why foo4 is slower despite that @code_lowered is exactly identical to foo1, because I suspect it to be very hard to find the cause.

What I learned: stay with 1.6, don’t look back.

1 Like