Differences in Two DataFrames

Given two DataFrames of different lengths, and different parameter names…

julia> dfA = DataFrame(:Parameter => ["size", "weight", "color", "smell"],
       :Value => ["small", "120", "red", "none"])
4×2 DataFrame
 Row │ Parameter  Value  
     │ String     String 
   1 │ size       small
   2 │ weight     120
   3 │ color      red
   4 │ smell      none
julia> dfB = DataFrame(:Parameter => ["size", "color", "smell", "temperature", "price"],
       :Value => ["small", "blue", "none", "hot", "50"])
5×2 DataFrame
 Row │ Parameter    Value  
     │ String       String 
   1 │ size         small
   2 │ color        blue
   3 │ smell        none
   4 │ temperature  hot
   5 │ price        50

I’d like a resulting DataFrame containing only the rows with non-matching Values, from dfB’s perspective. So in this example, “weight” is not in dfB, so it should be ignored, but “temperature” and “price” are new parameters, so they should register as a difference. The expectation, in this example is a DataFrame like this:

 Row │ Parameter    Value  
     │ String       String 
   1 │ color        blue
   2 │ temperature  hot
   3 │ price        50

I’m experimenting with the filter() function, but haven’t made it work. I’m pretty sure I can do this using a for() loop, but I’d like to avoid that if possible. Any suggestions? Thanks in advance!

I don’t think such a thing exists. See an earlier thread.

try this

antijoin(dfB,dfA,on=[:Parameter, :Value])

antijoin() certainly works on this sample data set, thanks! I’ll try it with the real data, but I don’t see why it would fail. I’ll report back, thanks, again!

EDIT: As expected, it works perfectly. I wasn’t familiar with the function, so thank you, @rocco_sprmnt21.

Some variants and alternatives

DataFrame([x for x in  eachrow(dfB) if x ∉ eachrow(dfA)])
eachrow(dfB)[findall(x->x ∉ eachrow(dfA), eachrow(dfB))]  
dfB[findall(x->x ∉ eachrow(dfA), eachrow(dfB)),:] 
1 Like

This is very good info, which I’m sure I’ll reference in the future. Thanks again! My attempt with setdiff() had failed when I tried it, but now see that the use of eachrow() is key. I’ll play with these options to get more familiar with their use and syntax.