Adding list of tuples

I have two long lists of tuples: For instance, A1 = [(2,3),(4,5)…] and A2 = [(4,6),(2,8)…]. The first element of each tuple is common to both lists. Now I want to produce a new list, A3, such that whenever the first element is matched, second elements are added. I could do it using for loop and if condition as shown below

A3 = []
for i in A1
   for j in A2
       if i[1]==j[1]
          push!(A3,(i, i[2]+j[2])
       end
   end
end

Is there any way I can achieve this same result using one line using something like filter function? Thank you in advance.

You can probably use Iterators.product to replace the double loop, then filter or Iterators.filter to replace the if then map or Iterators.map to replace the push!.

1 Like

Use Tuple{Int,Int}[] here if performance is of any concern.

(That given and inside a function, the loop is fine imo, I don’t see a one liner making that clearer)

4 Likes
D = Dict(first(A1).=>last(A1))
[ (k,v+D[k]) for (k,v) in A2 if haskey(D,k)]

This can also fit in one line ;). It assumes first elements in tuples are unique in A1 and A2, which looks reasonable from question.

In essence, this is a database join operation (might be more efficient to consider using DB for massive A1 and A2).

2 Likes

this could be a solution
this also works if the first element of the pair is not unique.

A1 = [(2,3),(4,5)]
A2 = [(4,6),(2,8)]
df1=DataFrame(x=first.(A1),y=last.(A1))
df2=DataFrame(x=first.(A2),y=last.(A2))

combine(groupby(vcat(df1,df2),:x), :y=>sum)




A=vcat(A1,A2)
df=DataFrame(x=first.(A),y=last.(A))
udf=unstack(df,:x,:y, valuestransform=sum)


but perhaps the most “natural” is the following


d1=Dict(Pair(e...) for e in A1)
d2=Dict(Pair(e...) for e in A2)

mergewith(+, d1, d2)
3 Likes

Perhaps it could be written more simply as:

d = mergewith(+, Dict(A1), Dict(A2))

And then to get the output as per OP do:

A3 = [(a, d[a[1]]) for a in A1]
2 Likes

I just want to stress that:

  1. Dicts are almost certainly slower than vectors of tuples.
  2. There is no reason whastosever to use any package or fancy syntax for this. The original proposal of the OP is perfectly fine if using Tuple{Int,Int}[] to initialize the resulting array and putting all that inside a function.
  3. That above will almost certainly be faster than any of the alternatives proposed here.
  4. IMO, the loop much is much clearer.

The fact that one can write something like the OP did and get close to the best one can get, just being explicit about the logic of what one wants to do is a fundamental feature of Julia.

edit: I’m not sure if the proposals here do the same as the OP proposal (or if they are what was expected, or not). Seems that people assumed that the first element of the tuple is equivalent to a dictionary key, which I’m not sure if is the case (are they unique?). Probably more info is necessary to actually understand what is the best approach.

5 Likes

It’s good that Julia can make loops performant when they are needed. However, capturing the logic of the operation in a named function like map allows thinking and communicating at a higher level than state-machine operations, so I like to use these functions when I can.

Here’s a C++ perspective on it:

https://belaycpp.com/2021/06/22/dont-use-raw-loops/

There are cases and cases. But I don’t generally agree with that. Very often code becomes impossible to understand after being written with clever combinations of higher level functions. Many, many times, the loop is way the most clear thing to read.

7 Likes

I have nothing new to add, except that this is how I’d write the OP’s loop:

A3 = Tuple{Int,Int}[]
for i in A1, j in A2
    i[1] == j[1] && push!(A3,(i, i[2]+j[2])
end

(Pretty much the same!)

1 Like

The requested operation is a join of these two lists.

julia> using FlexiJoins

julia> map(p -> (p.A1, p.A1[2] + p.A2[2]), innerjoin((;A1, A2), by_key(first)))
2-element StructArray(::Vector{Tuple{Int64, Int64}}, ::Vector{Int64}) with eltype Tuple{Tuple{Int64, Int64}, Int64}:
 ((2, 3), 11)
 ((4, 5), 11)
2 Likes

putting it all together, it could come like this …

[Tuple(d) for d in mergewith(+, Dict(A1), Dict(A2))]
1 Like