Is there a package to do DataFrames comparisons?

The a==b works really well if a and b are the same, but it doesn’t tell me what’s different if a and b are different e.g. some column name are different.

I would if there’s a package for comparing dataFrame like proc compare in SAS or like pandas assert equal for data frames.

Did you set a reminder to check on this anually? :smiley:

4 Likes

I approximately need to do it once per year it seems.

Thought discourse would pick up the post.

I guess there isn’t exactly something I needed. So might need to create something.

Does the response from Bogumił Kamiński in SO help a bit?

It would be really nice if @ssfrr 's DeepDiffs.jl directly supported Tables.
(probably displaying via PrettyTables.jl)
But it does not

it does however support vectors

So some demos

julia> using DataFrames, DeepDiffs

julia> costs = DataFrame(brand=["BurgerDuke", "BurgerDuke", "BurgerDuke", "McDannys", "McDan
       nys"], product = ["burger", "fries", "drink", "burger", "drink"], price=[10.0, 5.0, 1.0, 9.5
       , 0.5])
5×3 DataFrame
 Row │ brand       product  price   
     │ String      String   Float64 
─────┼──────────────────────────────
   1 │ BurgerDuke  burger      10.0
   2 │ BurgerDuke  fries        5.0
   3 │ BurgerDuke  drink        1.0
   4 │ McDannys    burger       9.5
   5 │ McDan\nnys  drink        0.5

julia> costs2 = DataFrame(brand=["BurgerDuke", "BurgerDuke", "BurgerDuke", "McDannys", "McDan
       nys"], product = ["burger", "fries", "drink", "burger", "drink"], price=[10.0, 7.0, 1.0, 9.5
       , 0.5])
5×3 DataFrame
 Row │ brand       product  price   
     │ String      String   Float64 
─────┼──────────────────────────────
   1 │ BurgerDuke  burger      10.0
   2 │ BurgerDuke  fries        7.0
   3 │ BurgerDuke  drink        1.0
   4 │ McDannys    burger       9.5
   5 │ McDan\nnys  drink        0.5

by row


by column

Perhaps best is to convert to dict:
as_dict(t) = getfield(Tables.dictcolumntable(t), :values)

6 Likes