Based
July 9, 2021, 5:43am
1
I have 2 dataframes. Each contains independent datapoints in different rows with different ID
s. They share most of the same columns, except for one. I want to combine them into one data frame. I feel like there should be a function that does it, but I canβt find it. So far as I can tell outerjoin
should do this, but it doesnβt work:
using DataFrames
test1 = DataFrame(ID=1:3, Name=["A", "B", "C"], Argh=[5.0, 62.1, 89.0])
test2 = DataFrame(ID=4:6, Name=["D", "E", "F"])
testmerge = outerjoin(test1, test2, on=:ID)
result
ERROR: LoadError: ArgumentError: Duplicate variable names: :Name. Pass makeunique=true to make them unique using a suffix automatically.
Have you tried to do, what they say in error message? Pass makeunique=true
?
1 Like
I agree with the above, although it seems to me like you are just trying to vertically concatenate the DataFrames rather than join
them?
julia> vcat(test1, test2, cols = :union)
6Γ3 DataFrame
Row β ID Name Argh
β Int64 String Float64?
ββββββΌββββββββββββββββββββββββββ
1 β 1 A 5.0
2 β 2 B 62.1
3 β 3 C 89.0
4 β 4 D missing
5 β 5 E missing
6 β 6 F missing
(note the cols = :union
argument which fills in the missing Argh
column in test2
with missing
)
4 Likes
Based
July 9, 2021, 6:16am
4
makeunique=true
creates a bizarre dataframe with duplicate columns:
julia> testmerge
6Γ4 DataFrame
Row β ID Name Argh Name_1
β Int64 String? Float64? String?
ββββββΌββββββββββββββββββββββββββββββββββββ
1 β 1 A 5.0 missing
2 β 2 B 62.1 missing
3 β 3 C 89.0 missing
4 β 4 missing missing D
5 β 5 missing missing E
6 β 6 missing missing F
Thatβs not bizarre at all, thatβs just what you asked for - you are join
ing two DataFrames, both with a Name
column, but youβre not joining on Name
, so the new DataFrame will have two Name
columns. As I said above, I think youβre just not looking for a join hereβ¦
2 Likes
If you feel confortable with SQL you can also do:
using SQLdf
test1 = DataFrame(ID=1:3, Name=["A", "B", "C"], Argh=[5.0, 62.1, 89.0])
test2 = DataFrame(ID=4:6, Name=["D", "E", "F"])
testmerge = sqldf("""
select * from test1
union
select *, 0.0 from test2
""")
6Γ3 DataFrame
Row β ID Name Argh
β Int64 String Float64
ββββββΌββββββββββββββββββββββββ
1 β 1 A 5.0
2 β 2 B 62.1
3 β 3 C 89.0
4 β 4 D 0.0
5 β 5 E 0.0
6 β 6 F 0.0
1 Like