The problem I’m having is I can’t figure out how to move data from one DataFrame to a different one. Let’s say I have two DataFrames.
df1 is:
Team Rating
Chi 10
Ny 8
Oak 6
Ott 6
df2 is:
HomeTeam HomeRat AwayTeam AwayRat
Chi missing Oak missing
Ny missing Ott missing
I want to be able to check if the home team or away the team is in df2 is the same as in Team in df1 and if it is move that value into the correct column. I’ve been trying over and over to try and figure this out with for loops and broadcasting but I just can’t get it.
What you are asking for is often referred to as an update functionality. I think this isn’t yet available in DataFrames, but there’s an issue for it somewhere (on my phone so a bit cumbersome to check).
The way to do it currently is:
leftjoin(df2, df1, on = :HomeTeam => :Team)
And then if you want to replace the missing values in the HomeRat column do
df2.HomeRat = coalesce.(df2.HomeRat, df2.Rating)
You can then do the same joining on the :AwayTeam column in the first step.
An alternative approach is to create a dictionary from your first DataFrame and then simply replace the columns in the second DataFrame like this:
df1dict = Dict(df1.Team .=> df1.Rating)
df2.HomeRat = [df1dict[x] for x in df2.HomeTeam]
df2.AwayRat = [df1dict[x] for x in df2.AwayTeam]
I have found this leftjoin_or_fill! function useful, although I can’t quite tell if it’s what you want here.
"""
leftjoin_or_fill!(df1, df2; on)
Adds columns from `df2` to `df1` using `leftjoin!`, except if there are duplicate columns
between `df1` and `df2`, then [`choose_nonmissing`](@ref) is used to merge them.
"""
function leftjoin_or_fill!(df1, df2; on)
if !(on isa AbstractVector)
on = [on]
end
shared_cols = setdiff(intersect(names(df1), names(df2)), string.(on))
leftjoin!(df1, df2; on, makeunique=true)
for col in shared_cols
df1[!, col] = choose_nonmissing.(df1[!, col], df1[!, string(col, "_1")]; col)
end
select!(df1, Not([string(col, "_1") for col in shared_cols]))
return df1
end
where
"""
choose_nonmissing(x, y; col="")
* If exactly one of `x` or `y` is non-`missing`, return it.
* If both are missing, return `missing`
* If both are non-missing, and `x != y`, throw an error. Otherwise return `x`.
"""
function choose_nonmissing(x, y; col="")
if ismissing(x)
return y
elseif ismissing(y)
return x
else
if x != y
error("Given non-missing values $x and $y which should be equal for column $col.")
end
return x
end
end
Thank you all so much for the for the help. I had to be away from my computer for a couple days but I was able to do what I was trying to do pretty quickly because of you all. If you’re interested this is what I ended up doing.