If we create foo4 as
julia> function foo4(x,y)
collect( y[x] for x in transcode(UInt8, x) )
end
which has identical lowered code as foo1:
julia> @code_lowered foo4(x,y)
CodeInfo(
1 ─ %1 = Main.:(var"#27#28")
│ %2 = Core.typeof(y)
│ %3 = Core.apply_type(%1, %2)
│ #27 = %new(%3, y)
│ %5 = #27
│ %6 = Main.transcode(Main.UInt8, x)
│ %7 = Base.Generator(%5, %6)
│ %8 = Main.collect(%7)
└── return %8
)
I get in 1.5.3:
julia> @btime foo1(x,y);
80.580 ms (2 allocations: 95.37 MiB)
julia> @btime foo4(x,y);
92.395 ms (2 allocations: 95.37 MiB)
which is quite astounding.
In 1.6.0 the timings are nearly identical.
I have installed 1.5.4 now and I get:
julia> @btime foo1(x,y);
80.843 ms (2 allocations: 95.37 MiB)
julia> @btime foo2(x,y);
81.314 ms (2 allocations: 95.37 MiB)
julia> @btime foo3(x,y);
81.172 ms (2 allocations: 95.37 MiB)
julia> @btime foo4(x,y);
93.333 ms (2 allocations: 95.37 MiB)
which doesn’t reproduce your timings, so my guess is, that there are differences in the hardware (CPU).
I don’t want to investigate why foo4 is slower despite that @code_lowered is exactly identical to foo1, because I suspect it to be very hard to find the cause.
What I learned: stay with 1.6, don’t look back.