Questiosn about `where` for function definitions (while unpacking arrays of tuples)

question

#1

The broadcast-call syntax for function with multiple return values results in an array of tuples.
Sometimes, you want the tuple of arrays instead. This comes up when computing
multiple related values on a multidimensional grid. I think I’ve come up with a good solution, using macros in Cartesian. However, this is the first time I’m using some of Julia’s features, and I have a few questions.

The problem:

julia> f(a, b) = a+b, a*b, a-b; # some transform returning multiple values
julia> v = f.(rand(3,1), rand(1,4));
julia> typeof(v)
Array{Tuple{Float64,Float64,Float64},2}

I’d need to store the 3 return values in 3 arrays:

julia> x, y, z = unpack(v);
julia> x
3×4 Array{Float64,2}:
 0.301013  0.888299  1.03866  1.0867
 0.853248  1.44053   1.5909   1.63894
 0.687546  1.27483   1.4252   1.47324

See: Destructuring and broadcast

The solution

After stumbling on the Base.Cartesian package through FastConv.jl (and getting my mind blown), I figured out the following:

@generated function unpack{N,M,U}(v::Array{T,N} where T <: NTuple{M,U})
    quote
        @nexprs $M i -> out_i = similar(v,U)
        @inbounds @nloops $N j v begin
            @nexprs $M i -> ((@nref $N out_i j) = (@nref $N v j)[i])
        end
        return @ntuple $M out
    end
end

Which works correctly, and is faster than other implementations using stuff like first.(v).

julia> v = f.(rand(500,1,1), rand(1,500,500));
julia> @time unpack(v);
  1.376500 seconds (11 allocations: 2.794 GiB, 4.19% gc time)
julia> x, y, z = f.(rand(100,1,1), rand(1,100,100)) |> unpack; # hurray!

I figured out it can be useful for others: https://github.com/spalato/Unpack.jl

Questions

  1. What exactly is going on in the type annotations?
    I couldn’t seem to find the appropriate documentation for where. Specifically, why does where T <: NTuple{M,U} need to be in the parenthesis, while {N,M,U} seem to annotate the function?

  2. I was somehow expecting the function unpack{N,M,U}(...) to provide multiple methods for different values of {N,M,U}. This is apparently not the case. Querying the methods always return the following, no matter how many different argument combinations where called before. Does the {N,M,U} annotation have a purpose beyond defining the variables?

    julia> methods(unpack)
    # 1 method for generic function "unpack":
    unpack(v::Array{T,N} where T<:Tuple{Vararg{U,M}}) where {N, M, U} in Unpack at ...
    
  3. Are all these type annotations really necessary, or am I overdoing this?

  4. Currently, this handles only tuples with elements of the same type (NTuples). This is usually the case in my line of work. Is there a way to obtain the types of the tuple from the type T, to generalize this?

  5. This implementation requires the allocation of an intermediary array (v in previous examples). This thus uses twice the memory compared to a manual loop. Is there a way to avoid this intermediary allocation (say, ‘hijack’ the broadcast call)?

Cheers!


#2

I probably won’t have a complete answer, but here are a few responses:

  1. Your syntax looks weird because you’re actually mixing two entirely different ways of specifying parametric methods: the pre-v0.6 version with f{T} and the post-v0.6 where T syntax. The former is being removed, since it’s confusing: PSA: Parametric method syntax deprecation about to drop

So instead, an equivalent definition in the modern syntax is:

function unpack(v::Array{T, N}) where {M, U, T <: NTuple{M, U}, N}
   ...

which hopefully makes it more clear what all the type variables are doing.

  1. Your definition actually creates infinitely many methods, since there are infinitely many values of M, U, etc. So it wouldn’t make sense for methods to list all of them. Instead it’s showing you that you’ve defined one method, but with free variables N, M, U which will be filled in from whatever arguments you specify (and thus new native code will be generated for each different N, M, and U)

  2. They’re only necessary if you actually need to refer to the type variables inside the function (which your current implementation does). But I think there’s a cleaner implementation that doesn’t need any of this:

function myunpack(v::Array{<:NTuple{N}}) where {N}
  ntuple(i -> getindex.(v, i), Val{N})
end

this does two levels of broadcasting: for each i in 1:N it calls getindex.(v, i), which itself broadcasts the getindex call across each element of v.

Does this perform well enough for your use case?

Also, please use BenchmarkTools.jl to time your code, to avoid accidentally including compile time.


#3

I also came up with the following implementations, which are indeed cleaner. (I also believe they work for tuples with nonuniform types).

unpack_tup(w) = Tuple((v->v[i]).(w) for i=1:length(w[1]))
unpack_arr(w) = [(v->v[i]).(w) for i=1:length(w[1])]

@time was giving me a performance difference, which is gone using BenchmarkTools.

Interestingly, these perform equivalently (or perhaps slightly faster) than your myunpack despite them not leveraging dispatch at all.

Benchmarks:

for sz in [10 50 100]
    packed = f.(rand(sz,sz,1), rand(1,1,sz))
    r = (@belapsed myunpack($packed))/(@belapsed unpack($packed))
    println("$sz^3 : $r")
end

yields

10^3 : 1.9916196826079609
50^3 : 2.0800071500245783
100^3 : 1.4245802367541702

That’s not a negligible performance gain. I’m guessing this has to do with the order the elements are accessed. Performance is a bit of a secondary concern for my use case: my arrays don’t fit into memory, and I’m writing to files using explicit loops. Unpacking came up annoyingly in the small tests and prototyping.

The documentation is a bit scattered at the moment (which is perfectly understandable). Thanks for your clarifying answers!

(Also, typo in title…)


#4

I’ve come up with a solution that works for both Tuple and NTuple (thus answering my own #4 above):

@generated function unpack(v::Array{T,N}) where {T <: Tuple, N}
    TT = T.types
    M = length(TT)
    quote
        @nexprs $M i -> out_i = similar(v,$TT[i])
        @inbounds @nloops $N j v begin
            @nexprs $M i -> ((@nref $N out_i j) = (@nref $N v j)[i])
        end
        return @ntuple $M out
    end
end

It’s performance is identical to the previous implementation for Arrays of NTuples.

For some reason, unpacking using broadcast-calls is very slow when multiple types are involved. Unpacking using Cartesian performs 20x faster with 1/10th of the memory compared to broadcasting:

function unpack_bc(w::Array{<:Tuple})
    Tuple((v->v[i]).(w) for i=1:length(w[1]))
end

g(a, b) = a+1im*b, a*b, convert(Int, round(a-b))
packed = g.(rand(100,100,1), rand(1,1,100))
println("Broadcast")
@btime unpack_bc($packed)
println("Cartesian")
@btime unpack($packed)

Outputs:

Broadcast
  177.048 ms (10000153 allocations: 350.96 MiB)
Cartesian
  9.585 ms (7 allocations: 30.52 MiB)

Cheers