Splat-to-append to tuple-of-arrays, or push! on array-of-arrays?

question

#1

I’m writing a function that needs to return a variable-length collection of arrays (depending on parameters in a file as well as optional function arguments). Each array may also have a different size. I can return the collection as a tuple of arrays or as an array of arrays. In the function, I want to allocate memory for the collection before filling in the array values. I do realize that a tuple is immutable, but it turns out that one can alter the array elements even when they’re indexed via the tuple(!).

For the tuple-based approach, it would look something like this:

t = (zeros(Int,2),)
t = t...,zeros(Int,3)
t[1][2] = 1 # this works even though t is immutable!
@show t;

which produces

t = ([0, 1], [0, 0, 0])

For the array-based approach, it goes like this

a = [Int[]]
pop!(a)
push!(a,zeros(Int,2))
push!(a,zeros(Int,3))
a[1][2] = 1
@show a;

which gives

a = Array{Int64,1}[[0, 1], [0, 0, 0]]

One advantage to the array approach is that I can push! inside a loop, without any special assignment on the first iteration. I couldn’t figure out how to do that with the tuple approach, because one cannot splat an empty tuple.

Other than that detail, which approach is best? I hope to get an answer before benchmarking both approaches.

Thanks in advance!


#2

Benchmarking both approaches is almost certainly the answer :wink:. https://github.com/JuliaCI/BenchmarkTools.jl is your friend here.

As a general rule of thumb, tuples are a good choice when the number of elements is (a) small and (b) inferrable by the compiler. Because a tuple’s length is part of its type, a function that returns a tuple whose length depends on data (as in your case) will not have an inferrable return type. That may make performance of your code worse, although benchmarking is still the best way to know for sure. Likewise, appending to a tuple in a loop is likely to be less efficient because the type of the tuple will change every time you append to it.

I do realize that a tuple is immutable, but it turns out that one can alter the array elements even when they’re indexed via the tuple(!).

Yup, that’s a good observation, and it’s true throughout Julia. A tuple (like a struct) is an immutable container, but containers don’t get to decide what you do to the objects they contain. That is, if I have a tuple ([1, 2, 3],), I can never change which array it contains, but I can do whatever I want to modify the contents of that array. Mutability of an object is never affected by what container you put that object in.

Regarding your implementation:

a = [Int[]]
pop!(a)

This is a pretty non-idiomatic (and inefficient approach), because it allocates a brand-new empty array just to throw it away on the next line. If you just want an array-of-arrays-of-ints, you can do:

julia> Vector{Int}[]
0-element Array{Array{Int64,1},1}

or, more verbosely:

julia> Vector{Vector{Int}}()
0-element Array{Array{Int64,1},1}

#3

Thanks!

My actual implementation has two collections of arrays. But I followed your “idiomatic verbose” example as follows:

d = Vector{Array{Complex64}}();
r = Vector{Vector{Float64}}();

The arrays contained in d are actually 3D. I don’t fully grok this syntax, but it’s nice not to have to pop!() in the beginning.


#4

One note: Array{Complex64} is actually a non-concrete type (since you haven’t specified the dimension), so a Vector{Array{Complex64}} can’t be stored efficiently. That’s because each element in the vector could potentially be an array with a different number of dimensions.

A concrete implementation would be Vector{Array{Complex64, 3}}.

I’m not sure if it will help you here, but I’ve found map to be helpful when trying to avoid verbosely describing the type of a container. For example, instead of:

x = Vector{Vector{Int}}()
for i in 1:10 
  push!(x, [i])
end

you can instead do:

x = map(1:10) do i
  [i]
end

and the compiler will correctly infer that x is a vector-of-vector-of-int. So you’ll get the same performance with less code.


#5

Thanks again!

The internal {T,ndims} specification does seem to be important. I’m not running BenchmarkTools yet, but that small change did appear to make my test case snappier.

The bottleneck I’m seeing seems to be related to the amount of file I/O I’m having to do, with lots of seeks and reads into an IOBuffer, followed by ProtoBuf.readproto(iob) calls, followed by unpacking samples from UInt32 arrays [using bit-shift .>>, bit-and .&, Base.signed.()] and converting to Complex. Our target system will have 10-Gig ethernet and I’ll probably be reading files from an NFS-mounted drive.

This is fun.