Best Way to Declare Arrays

I’m trying to wrap my head around arrays in Julia.

In Java, if you have an array (int[] arr, for example) with 100 elements, then to add another element, you have to make another array with size 101 and copy all the old elements over.

In Julia, push! doubles the size of allocated memory each time it needs to add space.

So how should I go about declaring an array?

I can use similar() or typeof() or even Array{...}(...) if I know the type I want. But these are either filling the array with seemingly random values, or with #undef. This causes issues for me when I want to incrementally add things to my array and iterate them. (I have a rough idea of the end size, but it won’t all be filled immediately.) If the array has random values (or even all zeros, all ones, or all -1, for instance), this requires extra care over which elements I iterate. If the array has #undef, then I have to add an extra isassigned check to my iteration, and I also have to keep track of which index I’m assigning next, because push! would even add an element after all the other #undef elements.

Can someone help me understand the efficiencies of this? What is the best practice to handle this sort of situation? Is my aversion to wanting to use an isassigned check unwarranted?

push! will not necessarily double the allocated memory for the array every time it is called, but it does need to make room for whatever you are appending to the array. It’s worth noting that in some cases push! in Julia can actually be surprisingly efficient (I think this happens because often there just happens to be enough space already anyway, Julia is certainly not optimized for low memory usage). Therefore, in cases in which either you cannot know the length of the array before iteration, or determining the length is so expensive or complicated that it isn’t worth avoiding calls to push!, you should use it, otherwise it’s better to avoid it.

There are quite a few ways of initializing arrays in Julia, some of which you already described. Indeed, Array{T}(undef, m, n) and similar are some of the most common. If the elements of this array are isbits types (T for which isbits(x) where x isa T), then all of the memory needed will be allocated when the constructor is called. In almost all cases this will mean your array is stack allocated rather than heap allocated (i.e. the most efficient way of allocating space for it).

If the elements of the array is a non-isbits type, essentially what you will wind up with is an array of pointers. Julia will allocate space for the pointers on the stack when the array is declared, but the memory for the elements themselves still needs to be allocated somehow. The reason for this is because it is not in general possible to know the size of these elements a priori (e.g. imagine filling your array with other arrays of random length).

The most common pattern is, for example

v = Vector{Float64}(undef, n)
for i ∈ 1:n
    v[i] = put_something_here()
end

You don’t need to worry about whether each element of v is already assigned when you do this. The only case I can think of in which you’d want to use isassigned when populating an array is if it was already partially populated by something else, so this’ll be rather uncommon.

Note also that list comprehensions are very commonly used in Julia and that this will infer the length and element type for you, if possible.

Not quite sure if this answers your question but hopefully I got close.

7 Likes

If you have a rough idea of the upper bound on size, you can first allocate an array of that size, and then empty! it. Julia won’t deallocate the reserved space, and push! into that array won’t allocate new memory.

1 Like

The preferred way to do this is with sizehint!. E.g. v = Int[]; sizehint!(v, 10_000) will tell julia to make space such that v has room to store 10_000 integers.

10 Likes

@squirrel, some array classes in Java also double: java - Expanding an Array? - Stack Overflow

I think what you’re missing is the following: the capacity is different from the size. It’s the capacity that doubles, not the size. Just try push! and see for yourself—the length is always equal to the number of elements you’ve push!ed.

The doubling is your friend. Consider what happens if you start with size 0 and grow incrementally to size n. If each time it allocated only one extra element, you’d have k element-copy operations to perform when growing from k to k+1. Consequently you have 1 + 2 + 3 + ... + n = n(n+1)/2 operations you need to perform.

Now consider what happens when you instead double the capacity. You only need to copy log2(n) times, and consequently you only have to do n*log2(n) work. This is a huge improvement in efficiency.

6 Likes

You’re right. “Capacity” is exactly what I was reaching for. I like that Julia doubles the capacity each time it needs to expand its capacity, I just would like to pre-define a large capacity for my array so that I can skip the growing pains.

That’s a clever idea. Are you sure that Julia will NEVER deallocate the space, or is there a possibility that the program might hit some sort of limitation and deallocate space (assumes) it no longer needs?

Ironically, this is my use case on both counts. :sweat_smile: :upside_down_face: :sob:

Thank you for your in-depth reply!

I know you saw it, but @Mason’s suggestion to use sizehint! is what you’re looking for.

5 Likes

In the current implementation, the only way to shrink the capacity is with sizehint!.

In addition to the others said: worrying about the details of GC is usually a distraction in Julia. Even if currently no space is deallocated, you should not and need not rely on this.

You should treat sizehint! as a hint: you are providing information, and the manner it will be used may change as the language evolves (even if this particular thing is unlikely to change). Being flexible about these things allows potential optimizations that are usually best left to the language.

1 Like

That’s a really good point. I don’t think I’ve every really considered programming in that sort of abstraction context before.