I have a data similar to the following in which one column is partly defined in one DataFrame and partly in a second
using DataFrames, DataFramesMeta
# Data situation:
df1 = DataFrame(id1 = [1, 2, 3, 4], id2 = [missing, missing, 'a', 'b'])
df2 = DataFrame(id1 = [1, 2], id2 = ['c', 'd'])
Currently I solve this by joining the DataFrames and then creating a new variable based on the old ones:
# Approach:
df = join(df1, df2, on = :id1, kind = :left, makeunique = true)
df = @transform(df, id2_all = ifelse.(ismissing.(:id2), :id2_1, :id2))
df = select(df, :id1, :id2_all)
rename!(df, :id2_all =>:id_2)
Is there a smarter way of doing this? Thanks!
You can write:
df1.id2[df2.id1] = df2.id2
for an in-place operation.
It will overwrite all values matching :id1
in df2
though (and assumes all df2
is a subset of df1
) - I understand this is what you want - your code does something a bit different though.
If you want to reproduce exactly what you have in your question do
idxs = intersect(df1.id1, df2.id1)
v = view(df1.id2, idxs)
v = ifelse.(ismissing.(v), view(df2.id2, idxs), v)
(this is also in-place)
2 Likes
Thanks for your response! I think I oversimplified the example a bit too much. id1
are not actually row indicators but rather some random id numbers and I am not certain the data is sorted correctly so I would like to do some kind of joining to make sure the ids align correctly. Therefore, I came up with the following based on your answer:
idxs = findall(ismissing.(df1.id2))
v = view(df1, idxs, :)
idys = map(x -> findfirst(x .== df2.id1), v.id1)
v.id2 .= df2.id2[idys]
Yes - joining is always tricky . Happy that in the end you have what you needed.
The crucial point is (and I see you know this, but just want to note this here) that DataFrames.jl supports views and broadcasting that is 100% consistent with Base, so whatever you expect to work in Base should directly translate do data frame context.
2 Likes