Why is there a massive difference between collect and [range...]?

julia> @time k = collect(1:10^7)
  0.294210 seconds (2 allocations: 76.294 MiB, 90.43% gc time)
10000000-element Vector{Int64}:
        1
        2
        ⋮
  9999999
 10000000

julia> @time k = [(1:10^7)...]
  1.663185 seconds (30.00 M allocations: 925.627 MiB, 28.68% gc time)
10000000-element Vector{Int64}:
        1
        2
        ⋮
  9999999
 10000000

Why is there such a big difference? Is splat just inefficient?

2 Likes

The second one is a function call with 10^7 arguments while the first one is a function call with one argument. I don’t know exactly how the compiler even deals with the second case but it’s intuitively clear that it’s more difficult to optimize, no?

2 Likes

To elaborate on the function call comment: [x...] seems to translate to vect(x...) after de-sugaring.

Just an idle observation: If you want to make your introspection tools hang/crash try

julia> x = (1:10^7)
1:10000000

julia> @code_warntype [x...] # very unhappy trying to list 10^7 arguments

The general rule of thumb for splatting is not to use it if you don’t know the length of the thing you’re splatting at compile time/the compiler doesn’t know it (as is the case for Vector). That’s the reason why we have both min and minimum, max and maximum and so on.

2 Likes

I assume the compiler can still kind of deal with all those arguments because it knows they’re of the same type. Like in Vararg{Int}. But I assume if that pattern was broken (let’s say Float64 and Int interleaved for 10e7 arguments, it would be worse.

What if you know at compile time that the length is 10^7? Suppose it’s a static range. Wouldn’t it still be a bad idea to try to compile a method with that many arguments?

No, them being the same type doesn’t help at all.

Yes, it would still be a bad idea.