Sorted Array implementation?

I guess some things changed since 6 years ?
Does somebody care to explain me what happens in the following code?

# compile insert!
insert!(rand(10), 1, 1.0);

v = rand(10_000_000);

@info "Inserting an element first (2x)"
@time insert!(v, 1, 1.0)
@time insert!(v, 1, 1.0)

@info "Inserting an element last (2x)"
@time insert!(v, length(v), 1.0)
@time insert!(v, length(v), 1.0)

@info "Inserting an element in the middle (2x)"
@time insert!(v, 5_000_000, 1.0)
@time insert!(v, 5_000_000, 1.0)

nothing

gives

julia> include("path/to/script.jl")
[ Info: Inserting an element first (2x)
  0.039745 seconds (1 allocation: 73.537 MiB)
  0.000003 seconds
[ Info: Inserting an element last (2x)
  0.000004 seconds (1 allocation: 16 bytes)
  0.000006 seconds (1 allocation: 16 bytes)
[ Info: Inserting an element in the middle (2x)
  0.011675 seconds
  0.007150 seconds
  • inserting first time in the start (left) seems to reallocate the whole vector
  • inserting from now on in the start is negligible with 0 allocations
  • always inserting on the end only allocates 16 bytes (2 * sizeof(Float64))
  • inserting in the middle is slow but does not allocate

Does this mean we basically shouldn’t really care whether we insert to a sorted vector (e.g. queue) in the start or in the end ?

When Julia grows an array because it doesn’t have room for more elements, it actually over-allocates the space in case more stuff needs to be added later. This significantly improves the speed of repeated insertions by avoiding the need for reallocation on every insertion.

So you can’t run a couple of insert! calls and conclude the allocation behavior, since the underlying “capacity” of the Vector is persistent state. It’s more likely that those tiny allocations you’re seeing are a result of doing the benchmarking in global scope and actually have nothing to do with growing the array.

A better indication of this is

julia> v = Int[]; for x in 1:200; @show x; @time insert!(v, length(v)+1, x); end
x = 1
  0.000004 seconds (1 allocation: 80 bytes)
x = 2
  0.000001 seconds
x = 3
  0.000001 seconds
(more "free" insertions...)
x = 9
  0.000001 seconds (1 allocation: 336 bytes)
(more "free" insertions...)
x = 42
  0.000001 seconds (1 allocation: 1.453 KiB)
(more "free" insertions...)
x = 175
  0.000018 seconds (1 allocation: 5.625 KiB)
(more "free" insertions...)

I see a nearly identical allocation pattern if I use 1 or cld(length(v)+1,2) as the insertion point. This array is too small to compare the cost of insertion location.

Ignoring the cost of allocating extra space when the buffer is full (which can happen for any insertion location), inserting in the middle will often be slower because it requires shifting elements to make space. I speculate that the best performance involves repeated insertions to either the beginning xor end (with somewhat worse performance if to both), but I haven’t done sufficient testing or code inspection to understand the actual behavior.

1 Like