How to compare non-missing elements of two DataFrames

Seeking terseness in order to keep my @testset readable.
In this particular @test, query responses (as DataFrames) need to be checked for accuracy.

There has to be a better way to do this:

response = DataFrame(:a=>["a",missing],:g=>["a",missing])
actual = DataFrame(:a=>["a",missing],:g=>["a",missing])
@test actual[:a][(!ismissing).(actual[:a])] == response[:a][(!ismissing).(actual[:a])] 
      && actual[:g][(!ismissing).(actual[:g])] == response[:g][(!ismissing).(actual[:g])]

I’m looking for something like the following, such that equality of non-missing values in all columns (:a, :g , etc) are checked without the columns themselves being listed explicitly

notmissing = !ismissing.(actual)
@test all(actual[notmissing] .== response[notmissing])

Edit: added multiple columns and corrected syntax on !ismissing

You need (!ismissing) in parentheses.

Thanks for this, I have updated the original post for clarity

For looking at the whole dataframe, could you do dropmissing for both data frames? I guess that wouldn’t allow for different columns to have different missing placement. You would be dropping too many rows.

Also, you should be using the two argument getindex method, instead of

actual[:a][(!ismissing).(actual[:a])]

do

actual[ismissing.(actual.a) .== false, :a]

But I think you should do

ulia> function check_nonmissing(df1, df2)
       for n in names(df)
           if collect(skipmissing(df1[!, n])) != collect(skipmissing(df2[!, n]))
               return false
           end
       end
       return true
       end
1 Like