Tuple of Tuple to Tuple

Hello,

I have 2 variables like

julia> a = 1:5
1:5

julia> b = [("a", 10), ("b", 20), ("c", 30), ("d", 40) , ("e", 50)]
5-element Vector{Tuple{String, Int64}}:
 ("a", 10)
 ("b", 20)
 ("c", 30)
 ("d", 40)
 ("e", 50)

When I create an iterator and collect it I get

julia> collect(zip(a, b))
5-element Vector{Tuple{Int64, Tuple{String, Int64}}}:
 (1, ("a", 10))
 (2, ("b", 20))
 (3, ("c", 30))
 (4, ("d", 40))
 (5, ("e", 50))

I’d like to get

5-element Vector{Tuple{Int64, String, Int64}}:
 (1, "a", 10)
 (2, "b", 20)
 (3, "c", 30)
 (4, "d", 40)
 (5, "e", 50)

Same question to get a Vector of NamedTuple (after collecting iterator) like so

 (c1=1, c2="a", c3=10)
 (c1=2, c2="b", c3=20)
 (c1=3, c2="c", c3=30)
 (c1=4, c2="d", c3=40)
 (c1=5, c2="e", c3=50)

I’m looking for a solution with Iterators with Tuples in b having any numbers of elements

Kind regards

You got close:

julia> [(ai, bi...) for (ai, bi) in zip(a, b)]
5-element Vector{Tuple{Int64, String, Int64}}:
 (1, "a", 10)
 (2, "b", 20)
 (3, "c", 30)
 (4, "d", 40)
 (5, "e", 50)

You could iterate over collect(zip(a, b)) instead of zip(a, b), but why waste a step.

2 Likes

Wht don’t you consider DataFrames.jl? Maybe It’s a perfect solution to your problem.

julia> using DataFrames

julia> X = DataFrame(
       [1 "a" 10
        2 "b" 20
        3 "c" 30
        4 "d" 40
        5 "e" 50], [:c1, :c2, :c3])
5Γ—3 DataFrame
 Row β”‚ c1   c2   c3  
     β”‚ Any  Any  Any
─────┼───────────────
   1 β”‚ 1    a    10
   2 β”‚ 2    b    20
   3 β”‚ 3    c    30
   4 β”‚ 4    d    40
   5 β”‚ 5    e    50

julia> X.c2
5-element Vector{Any}:
 "a"
 "b"
 "c"
 "d"
 "e"

julia> X.c3[3]
30

Thanks @Benny but how to get an iterator of that?

How to get also Iterator of NamedTuple?

PS : I’m looking for a solution without a library such as DataFrames

As in a lazy iterable without collecting? Just surround with () instead of [], it’ll make a generator expression. I didn’t say anything about NamedTuples because you didn’t provide an alternate input for that case, but now I’m reading it, maybe you wanted that output from those same inputs:

julia> nt = (NamedTuple{(:c1, :c2, :c3)}( (ai, bi...) ) for (ai, bi) in zip(a, b))
Base.Generator{Base.Iterators.Zip{Tuple{UnitRange{Int64}, Vector{Tuple{String, Int64}}}}, var"#3#4"}(var"#3#4"(), zip(1:5, [("a", 10), ("b", 20), ("c", 30), ("d", 40), ("e", 50)]))

julia> collect(nt)
5-element Vector{@NamedTuple{c1::Int64, c2::String, c3::Int64}}:
 (c1 = 1, c2 = "a", c3 = 10)
 (c1 = 2, c2 = "b", c3 = 20)
 (c1 = 3, c2 = "c", c3 = 30)
 (c1 = 4, c2 = "d", c3 = 40)
 (c1 = 5, c2 = "e", c3 = 50)
2 Likes

At the point of the NamedTuple collection, I would also suggest a tabular structure from a library like DataFrames.jl, TypedTables.jl, etc, many of which implement the Tables.jl interface, with different features and flexibilities; row iteration in some of them compute such NamedTuples lazily. You seem concerned about wasting time making independent copies of memory, but you already need to instantiate a and b, and tabular structures often can hold references to them instead of copying. If you need their features, there’s no reason to avoid packages.

2 Likes

this could be a way to use the DataFrames and Tables functions for this purpose

julia> transform(DataFrame(;a,b),:b=>[:b,:c])
5Γ—3 DataFrame
 Row β”‚ a      b       c     
     β”‚ Int64  String  Int64
─────┼──────────────────────
   1 β”‚     1  a          10
   2 β”‚     2  b          20
   3 β”‚     3  c          30
   4 β”‚     4  d          40
   5 β”‚     5  e          50

julia> cnt=collect(Tables.namedtupleiterator(df))
5-element Vector{NamedTuple{(:a, :b, :c), Tuple{Int64, String, Int64}}}:       
 (a = 1, b = "a", c = 10)
 (a = 2, b = "b", c = 20)
 (a = 3, b = "c", c = 30)
 (a = 4, b = "d", c = 40)
 (a = 5, b = "e", c = 50)

Tuple.(cnt)

#or
copy.(eachrow(df))
[(r...,) for r in eachrow(df)]

If you are not fundamentally against any dependencies, consider the lightest weight – while fully-functional and very popular – columnar table/array package, StructArrays.jl. It is a great fit for a wide range of usecase: from when regular vectors-of-tuples or namedtuples start becoming inconvenient, to all kinds of advanced tabular data manipulation functions. Seems like you are somewhere in this range (:

1 Like

Could StructArray allow a solution substantially different from the one proposed by Benny?
I also mean easier to implement by a user with not much experience

a = 1:5


b = [("a", 10), ("b", 20), ("c", 30), ("d", 40) , ("e", 50)]


genNT(a,b)=((a=x,b=y,c=z) for (x,(y,z)) in zip(a,b))

collect(genNT(a,b))


gentup(a,b)=((x,y,z) for (x,(y,z)) in zip(a,b))

collect(gentup(a,b))
using StructArrays
StructArray((x,y,z) for (x,(y,z)) in zip(a,b))
StructArray((a=x,b=y,c=z) for (x,(y,z)) in zip(a,b))

is there, for example, a flatten() function that might be useful in this case or in more general cases with many nested levels?

I had no more nested level. This is very elegant code. Thanks @rocco_sprmnt21 for your help !