How to create a vector of variable length in julia?


#1

I’d like to create a vector and then add value to it one by one. The thing is, I do not know how many values I’d like to add in total until the process is done. Is this doable using julia vector? Or do I need another data structure instead? Many thanks.


#2
julia> v = Int[]
0-element Array{Int64,1}

julia> push!(v, 42)
1-element Array{Int64,1}:
 42

julia> push!(v, 3)
2-element Array{Int64,1}:
 42
  3

#3

@bennedich, thank you. pop! occurs in my mind after I post this topic. :wink: Then there must be the other operation called push!.


#4

Is this an efficient operation? Am I correct in that by pushing to a vector, the compiler has to allocate new space, and copy the old vector over?
I am sure for small vectors this is a non issue, but what happens when your vector is expected to grow large? Would it allocate/copy everytime?


#5

Vectors are by default variable length in Julia. If you want fixed length, you should use StaticArrays or perhaps tuples. They are not very useful for large sizes, though.

I believe that when you repeatedly push!, extra space is allocated, something like doubling (or 1.5x?) the buffer each time you overflow the old. You can also use sizehint! if you have a rough idea what the final size will be.


#6

Yes, doubling:


#7

If you push! into a vector, julia over-allocates memory. Until your vector grows close to your total amount of memory, you get amortized O(1) push!.

Is this efficient? Well, there is an issue about push! being “horribly slow”, because the compiler cannot properly inline it. On my 2ghz laptop:

julia> using BenchmarkTools

julia> function pushtest(v, N)
       resize!(v,0)
       for i=1:N
       push!(v, i)
       end
       nothing
       end

julia> v=Int[];
julia> t=@belapsed pushtest(v, 10^4);
julia> t*2e9/1e4
13.1304
julia> @noinline function pushtest2(N)
       v=Int[]
       for i=1:N
       push!(v, i)
       end
       v
       end
julia> t=@belapsed pushtest2(10^4);
julia> t*2e9/1e4
15.9796
julia> t=@belapsed collect(1:10^4);
julia> t*2e9/1e4
1.5772

You see that push! is medium-fast: factor of 10 slower than memcopy, and don’t worry about the internal copying on growth.

The issue is overhead for the call into the runtime. But 15 cycles/push is still OK for most applications.


#8

The right question in this context is whether the operation is efficient conditional on having an unknown number of values. As others have explained, the preallocation of extra space is fairly efficient in general.

That said, if you know something about the distribution of the length, you can surely do better. Eg if you know it is \le 100 with probability 0.9999, you can use

sizehint!(v, 100)

at the beginning. In practice, these optimizations are rarely worth the complication.