Set-like operations on DataFrames?

How to merge two dataframes with the same columns, similar to set operations?
For example, whether there are functions like

df = union(df1, df2, on=:ID)
df = intersect(df1, df2, on=:ID)
df = diff(df1, df2, on=:ID)

Can you please explain what you mean by “similar to set operations”? Can you please give some example and an expected output?

You can look up sql-styled joins in julia which will answer some of your questions.

https://juliadata.github.io/DataFrames.jl/stable/man/joins/

An intersect is basically an inner-join.
A union can be accomplished by hcat().

What are you trying to accomplish by diff?

Treat each row as an element in a set. This is viable because we assume the data has unique keys. Then perform set like operations like

julia> a = Set([1 2 3])
Set([2, 3, 1])

julia> b = Set([2 3 4])
Set([4, 2, 3])

julia> union(a, b)
Set([4, 2, 3, 1])

julia> intersect(a, b)
Set([2, 3])

julia> setdiff(a, b)
Set([1])

Note: replace the elements to rows of a dataframe.

Thanks but its not sql-styled joins. It’s set-styled operations. I gave examples in the above reply :slight_smile:

This is what I assumed but your definition has on keyword argument which I do not understand in the context of your answer.

Without on argument you can do e.g.:

DataFrame(union(eachrow.([df1, df2])...))

or

DataFrame(union(Tables.rowtable.([df1, df2])...))

(and the same with intersect and diff)

1 Like

This is clever!

Didn’t know the union can operate on iterators. Cool!!