Performance implications of Array mutations (`push`, `splice`, `pop`, etc.)

Could anyone speak to the performance implications of modifications of arrays that change array length? My understanding has always been it’s best to minimize modifications of array that change length, as the fact arrays generally require contiguous memory blocks means that adding an element requires the compiler to allocate a new block of memory and copy over the entire array plus the new element. Yet I see a lot of the use of push! and pop! in tutorials.

Relatedly, Julia doesn’t seem to have linked-lists as a core collection, and people seem to use Vector{Any} as though they were linked lists, but seems like if they are using contiguous blocks of memory, that’s a problem…

When an array in Julia (and in many other languages) grows beyond its capacity, its capacity is increased by a fixed factor (it might be 2, or maybe 1.5, or somewhere in between). That means that “most” push!() operations will be able to expand into that extra space without any new allocations or copying. This means that push!() in Julia has amortized constant time.

As for linked lists, I think that the (amortized) constant time append operations for Vector, combined with O(1) indexing and good cache locality mean that Vector{Any} is a good choice for storing generic lists of stuff.

Of course, if you’re going to be doing things like adding elements to the middle of an array, and you’re not interested in indexing into that array, then a linked list makes a lot of sense. There’s one available in DataStructures.jl.

6 Likes

@rdelts Brilliant, thanks!

sizehint! could also help.
https://docs.julialang.org/en/stable/stdlib/collections/

1 Like

That is also super useful, thanks @Elrod