Push! is very slow on Vector{Union{...}}

I’ve only found the one mention of this below and I couldn’t find any followup.

So here is an independent repro:

julia> versioninfo()
Julia Version 0.7.0-beta.0
Commit f41b1ecaec (2018-06-24 01:32 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E3-1505M v5 @ 2.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.0 (ORCJIT, skylake)

julia> function silly_copy(xs)
           ys = empty(xs)
           for x in xs
               push!(ys, x)
           end
           ys
       end
silly_copy (generic function with 1 method)

julia> using BenchmarkTools

julia> @benchmark silly_copy($(Vector{Int64}(1:1000000)))
BenchmarkTools.Trial: 
  memory estimate:  9.00 MiB
  allocs estimate:  20
  --------------
  minimum time:     5.203 ms (0.00% GC)
  median time:      5.740 ms (0.00% GC)
  mean time:        5.801 ms (3.63% GC)
  maximum time:     7.617 ms (15.03% GC)
  --------------
  samples:          831
  evals/sample:     1

julia> @benchmark silly_copy($(Vector{Union{Int64, Missing}}(1:1000000)))
BenchmarkTools.Trial: 
  memory estimate:  10.13 MiB
  allocs estimate:  20
  --------------
  minimum time:     12.880 s (0.00% GC)
  median time:      12.880 s (0.00% GC)
  mean time:        12.880 s (0.00% GC)
  maximum time:     12.880 s (0.00% GC)
  --------------
  samples:          1
  evals/sample:     1

Note the time units - the 2nd version is 2000x slower!

I believe this is also responsible for the massive performance regressions I’m seeing for DataFrames.join on 0.7

Actually, I’m not sure why I posted this here :slight_smile:

I went ahead and made an issue instead - https://github.com/JuliaLang/julia/issues/28076

1 Like

In 0.6 the Union benchmarks at 50ms vs 6ms for the non-union eltype, using a slightly modified test function:

function silly_copy(xs::Vector{T}) where T
                  ys = T[]
                  for x in xs
                      push!(ys, x)
                  end
                  ys
              end