Interesting. Here, with 1.7, the loop version is improved (considering the correction in makearray3
pointed above):
julia> @btime makearr2(Float64, 500, 1000);
195.416 μs (1000 allocations: 3.89 MiB)
julia> @btime makearr3(Float64, 500, 1000);
83.967 μs (501 allocations: 3.88 MiB)
julia> @btime makearr4(Val(Float64), 500, 1000);
85.357 μs (501 allocations: 3.88 MiB)
In Julia 1.6 I have:
julia> @btime makearr3(Float64, 500, 1000);
189.542 μs (1001 allocations: 3.89 MiB)
Thus, the gain associated to the specialization in 1.6 is given in 1.7 for some reason.