Is there a package to compare if two DataFrames are the same?

Something that has more options than the == in Base for comparing dataframes? E.g. one where you can set the tolerance of difference in a float column. And also can compare columns by name even though they may be arranged in different orders?

Something like proc compare in SAS.

Not aware of anything baked in, but seems easy enough to do using isapprox?

You’d just have to decide whether you want to decide equality elementwise, columnwise or across the whole DataFrame (i.e. if say you have an absolute tolerance of 0.1, do you consider two dfs equal if all elements are withing 0.1 of each other but in total the difference either in a column or across all columns is larger than 0.1?)

Just do something like:

julia> using DataFrames

julia> x = rand(10);             

julia> x2 = copy(x); x2[end] += 0.01
0.7120634304599409

julia> df = DataFrame(a = x, b = x, c = rand(10));  

julia> df2 = DataFrame(a = x, b = x2, c = rand(10)); 

julia> isapprox.(df, df2) 
10Γ—3 DataFrame                                         
β”‚ Row β”‚ a    β”‚ b    β”‚ c    β”‚                           
β”‚     β”‚ Bool β”‚ Bool β”‚ Bool β”‚                          
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€                           
β”‚ 1   β”‚ 1    β”‚ 1    β”‚ 0    β”‚                           
β”‚ 2   β”‚ 1    β”‚ 1    β”‚ 0    β”‚                           
β”‚ 3   β”‚ 1    β”‚ 1    β”‚ 0    β”‚                           
β”‚ 4   β”‚ 1    β”‚ 1    β”‚ 0    β”‚                           
β”‚ 5   β”‚ 1    β”‚ 1    β”‚ 0    β”‚                           
β”‚ 6   β”‚ 1    β”‚ 1    β”‚ 0    β”‚                           
β”‚ 7   β”‚ 1    β”‚ 1    β”‚ 0    β”‚                           
β”‚ 8   β”‚ 1    β”‚ 1    β”‚ 0    β”‚                           
β”‚ 9   β”‚ 1    β”‚ 1    β”‚ 0    β”‚                           
β”‚ 10  β”‚ 1    β”‚ 0    β”‚ 0    β”‚ 

julia> isapprox.(df, df2, atol = 0.15)
10Γ—3 DataFrame                 
β”‚ Row β”‚ a    β”‚ b    β”‚ c    β”‚   
β”‚     β”‚ Bool β”‚ Bool β”‚ Bool β”‚ 
β”œβ”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€                           
β”‚ 1   β”‚ 1    β”‚ 1    β”‚ 0    β”‚                           
β”‚ 2   β”‚ 1    β”‚ 1    β”‚ 0    β”‚  
β”‚ 3   β”‚ 1    β”‚ 1    β”‚ 1    β”‚  
β”‚ 4   β”‚ 1    β”‚ 1    β”‚ 0    β”‚  
β”‚ 5   β”‚ 1    β”‚ 1    β”‚ 0    β”‚  
β”‚ 6   β”‚ 1    β”‚ 1    β”‚ 0    β”‚  
β”‚ 7   β”‚ 1    β”‚ 1    β”‚ 0    β”‚  
β”‚ 8   β”‚ 1    β”‚ 1    β”‚ 0    β”‚  
β”‚ 9   β”‚ 1    β”‚ 1    β”‚ 0    β”‚  
β”‚ 10  β”‚ 1    β”‚ 1    β”‚ 1    β”‚ 

and then use some combination of any and all depending on how you want to define whether they’re equal ot not.

1 Like