Destructuring vectors of tuples: proper argument type?

Hello! I would like to write a generic function that, given a generic vector of tuples of equal but variable length, such as (for length three):

mytuple = [(1, "a", "α"), (2, "b", "β"), (3, "c", "γ")]

returns me separate vectors corresponding to each position, as follows.

# Write a function used to destructure a vector of tuples into separate, argument-specific vectors
function unzip(tuple)
    return map(collect, zip((Tuple(e) for e ∈ tuple)...))
end

julia> unzip(mytuple)
3-element Vector{Vector}:
 [1, 2, 3]
 ["a", "b", "c"]
 ["α", "β", "γ"]

I have, however, struggled with declaring the appropriate type in the function argument. Suppose I try:

function unzip_new(tuple::Vector{Tuple{Vararg{Any}}})
    return map(collect, zip((Tuple(e) for e ∈ tuple)...))
end

function unzip_new(tuple::Vector{Tuple{Any}})
    return map(collect, zip((Tuple(e) for e ∈ tuple)...))
end

then, any run of unzip_new(mytuple) returns me an error of the kind “no method matching type”. How is it the case? I tried several variations of type declaration in the function argument, but to no avail.

To give you guys a bit of context, the reason why I need to do this, is that I want to perform a function that returns multiple values (in a tuple) in a loop. Hence, I would ask the loop or a suitable comprehension to give me the vector of tuples corresponding to the collection of results in the loop, and then “reshape” the results in order to obtain the proper vectors. Not sure if this is the right approach though.

If you know the number of iterations in the loop, or a not-too-wasteful upper bound on the number of iteration, pre-allocating the vectors to hold the results is efficient.

1 Like

You could use this

numbers = getindex.(mytuple, 1) 

or

[ getindex.(mytuple, i) for i in eachindex(first(mytuple)) ]

For the type information, how about

function unzip_new(tuple::Vector{T}) where {T <: Tuple}
    return map(collect, zip((Tuple(e) for e ∈ tuple)...))
end
1 Like

What do you suggest exactly on this particular point, @Dan? Something like this?

function myfun(x, y)
return (x^2 + y^2, x^2 - y^2) 
end

A = rand(1:10, 1000, 2)
b = zeros(1000, 2)

for r ∈ size(b, 1)
b[r, 1], b[r, 2] = myfun(A[r, 1], A[r, 2])
end

This is very poor code and it probably won’t work, but it serves me make a point (i.e. my question back to you).

Thanks a lot @SteffenPL!!! Both suggestion work. Quick question:

  1. Do you think the getindex() approach is more efficient?

  2. I don’t understand the syntax after wher in your function declaration. Any links where I can read more?

Thanks!!!

Mh, given your last example, I’m not sure if a vector of tuples is really needed in your case.

  • getindex.(mytuples, 1) will allocated a new array. In this sense, very inefficient. I think I misunderstood your problem…
  • The where {T <: Tuple} syntax says that T can be anything which is a subtype of Tuple. That can help if you don’t know how to exactly specify it.

Maybe it would be good to share another minimal example, which shows why exactly you need a vector of tuples.

2 Likes

Was thinking something like:

b = Matrix{Float64}(undef, size(A,1), 2);
for r in axes(b, 1)
    b[r, :] .= myfun(A[r, 1], A[r, 2])
end

Note the tuples from myfun are never stored in a vector as tuples, but stored in final location in the b Matrix.

2 Likes

Thanks a lot for the useful replies, @SteffenPL and @Dan! As questions popped up on what I really need to do and why, here is an almost-functional MWE which should be pretty self-explanatory (the only part that doesn’t work is the one that mentions function f(), but you can edit it with any function you like, I hope you’ll get the point anyways). I’d be glad to hear if the approach I suggested in my OP sounds still viable or rather, it is so clearly inefficient that a conceptual restructuring of the code is needed. Thanks a lot.

# Load this package for the sake of function 'wsample'
using Distributions

# I need to perform an elaborate Monte Carlo simulation
# I preallocate the entries of my Monte Carlo "estimates"
b = zeros(10, 2)

# Here is a function that generates some data in a vector
# I have a somewhat more complicated version of this
function tech(n)
    return rand(1:10, n * 200)
end

# Here is a function that generates a tuple of two elements
# Each element is a vector of arrays of basically random numbers, but with a shared origin (I)
# I have a way more complicated version of this really
# I cannot reduce the vector of arrays as the nested arrays can be of arbitrary dimension in each vector
function net(I)
    I₁ = [reshape(wsample(I, repeat([1], length(I)), 10), 5, 2) for i ∈ 1:20]
    I₂ = [reshape(wsample(I, repeat([1], length(I)), 10), 2, 5) for i ∈ 1:20]
    return I₁, I₂
end

# Here is the function from my OP, as adjusted by SteffenPL
function unzip(tuple::Vector{T}) where {T <: Tuple}
    return map(collect, zip((Tuple(e) for e ∈ tuple)...))
end

# And here goes the Monte Carlo proper
for r ∈ eachindex(b)
    # Set up a shared vector to sample from
    Iᵣ = tech(20)
    # Create the vectors of interest: they come as vectors (of length 10) of tuples
    Wᵣ = [net(Iᵣ) for t ∈ 1:10]
    # Now I need two vectors, Xᵣ and Yᵣ, which contain only one element of each tuple (first or second)
    # So using the function that I suggested in my OP, it would look something like:
    Xᵣ, Yᵣ = unzip(Wᵣ)
    # Then I would have a function f() that performs some pseudo-data analysis, whichever you prefer (even say sum or mean)
    # The iteration would conclude as follows:
    b[r, 1] = f(Xᵣ)
    b[r, 2] = f(Yᵣ)
end

I would do it like this:

using Distributions
using Random
using Statistics


# Julia is column major, 
# therefore the number of monte carlo steps goes into the second dimension
b = zeros(2, 10)  

# preallocate data for each step
Iᵣ = Vector{Int64}(undef, 20 * 200)
Xᵣ = Array{Float64}(undef, 2, 5, 20, 10)  # again, column major
Yᵣ = Array{Float64}(undef, 5, 2, 20, 10)  # ...

function tech!(Iᵣ, n)
    rand!(DiscreteUniform(1,10), Iᵣ)
end

function net!(Xᵣₜ, Yᵣₜ, I)
    w = fill(1, length(I))
    for i in 1:20 
        wsample!(I, w, view(Xᵣₜ, :, :, i))
        wsample!(I, w, view(Yᵣₜ, :, :, i))
    end
end

for r ∈ axes(b,2)
    
    tech!(Iᵣ, 20)
    
    for t in 1:10
        net!(view(Xᵣ,:,:,:,t), view(Yᵣ,:,:,:,t), Iᵣ)
    end
    
    b[1, r] = mean(Xᵣ)
    b[2, r] = std(Yᵣ)
end

Some comments:

  • In your code it should be axes(b,1) instead of eachindex.
  • In Julia, elements from a column of a matrix are closeby in memory. Therefore it is better to pick dimensions such that computations are done per column instead of per row.
  • Instead of converting into tuples back and forth, I would recommend preallocating arrays with the needed dimension. Then just use view to write into these arrays.
  • I also used the in-place variants of rand! and wsample! as demonstration :wink:

The problem you came across is also rather common. There are several packages which address the issues which occur when one has ‘Vectors of Tuples/Structs’ with possibly different dimension. See for example jonniedie/ComponentArrays.jl: Arrays with arbitrarily nested named components. (github.com) This even allows nestled arrays with different dimensions…

1 Like

Tried to make the code a little bit more compact (mostly by removing comments :stuck_out_tongue_winking_eye:):

using Distributions

tech(n) = rand(1:10, n * 200)

function net(I)
    uniform = ones(size(I))
    I₁ = [wsample(I, uniform, (5, 2)) for _ ∈ 1:20]
    I₂ = [wsample(I, uniform, (2, 5)) for _ ∈ 1:20]
    return I₁, I₂
end

unzip(tup) = collect.(zip(tup...))

f(Q) = 1.0 # The input Q is pretty weird, so I made it simple

function doit(N = 10)
    res = Matrix{Float64}(undef, N, 2)
    for r ∈ axes(res,1)
        Iᵣ = tech(20)

        Wᵣ = [net(Iᵣ) for _ ∈ 1:10]
        Xᵣ, Yᵣ = unzip(Wᵣ)

        res[r, 1] = f(Xᵣ)
        res[r, 2] = f(Yᵣ)
    end
    return res
end

b = doit()

Making the code work is a big plus. And also, keeping all the running inside functions is also a Julian way. @SteffenPL had some more good changes in his answer (to reduce allocations). But it feels like the problem is a bit too sanitized (from original source) to know what can and cannot be optimized.

As for the subject-line question of the OP, there is a cleaner version of unzip in the code above, and there is no need to add too many Type annotations in Julia as the parameters “arrive” at the method with their own compile-time inferred types (provided the code is type-stable). Making the code type-stable is one of the main reasons to put all the code in functions and avoid using global variables within functions.

2 Likes

@SteffenPL and @Dan, thank you so much for all the advice and hints. I am learning so much. I am happy and proud to have switched to Julia and entered this community! I hope to contribute to it in the future.

1 Like

have you tried if this is ok?

collect.(mytuple)
julia> mytuple_neq = [(1, "a", "α"), (2, "b"), (3,)]
3-element Vector{Tuple{Int64, Vararg{String}}}:
 (1, "a", "α")
 (2, "b")
 (3,)

julia> collect.(mytuple_neq)
3-element Vector{Vector}:
 Any[1, "a", "α"]
 Any[2, "b"]
 [3]