Find DataFrame row with missing values present

I am trying to find a row in a dataframe column that has missing values as well. As an example the following code without the missing value works:

julia> df = DataFrame(name=["a", "b", "c"], value=[1,2,3])
3×2 DataFrame
 Row │ name    value
     │ String  Int64
─────┼───────────────
   1 │ a           1
   2 │ b           2
   3 │ c           3

julia> df[df.name .== "a", :]
1×2 DataFrame
 Row │ name    value
     │ String  Int64
─────┼───────────────
   1 │ a           1

julia>

However when a missing value is introduced in the column it no longer work as shown here:

julia> df = DataFrame(name=["a", "b", missing], value=[1,2,3])
3×2 DataFrame
 Row │ name     value
     │ String?  Int64
─────┼────────────────
   1 │ a            1
   2 │ b            2
   3 │ missing      3

julia> df[df.name .== "a", :]
ERROR: ArgumentError: unable to check bounds for indices of type Missing
Stacktrace:
 [1] checkindex(#unused#::Type{Bool}, inds::Base.OneTo{Int64}, i::Missing)
   @ Base .\abstractarray.jl:725
 [2] checkindex
   @ .\abstractarray.jl:740 [inlined]
 [3] getindex(df::DataFrame, row_inds::Vector{Union{Missing, Bool}}, #unused#::Colon)
   @ DataFrames C:\Users\jakez\.julia\packages\DataFrames\dgZn3\src\dataframe\dataframe.jl:600
 [4] top-level scope
   @ REPL[34]:1

julia>

What is the technique to find the requested row?

julia> df = DataFrame(name=["a", "b", missing, "a"], value=[1,2,3,4])
4×2 DataFrame
 Row │ name     value
     │ String?  Int64
─────┼────────────────
   1 │ a            1
   2 │ b            2
   3 │ missing      3
   4 │ a            4

julia> named_a = findall(skipmissing(df.name)) do nm
       nm == "a"
       end

julia> named_a = findall(skipmissing(df.name)) do nm
       nm == "a"
       end
2-element Vector{Int64}:
 1
 4

julia> df[named_a, :]
2×2 DataFrame
 Row │ name     value
     │ String?  Int64
─────┼────────────────
   1 │ a            1
   2 │ a            4

Another option using the nifty coalesce function:

julia> df[coalesce.(df.name .== "a",false),:]
2×2 DataFrame
 Row │ name     value 
     │ String?  Int64 
─────┼────────────────
   1 │ a            1
   2 │ a            4

The coalesce function and its equivalent for replacing nothing values, something(...) are very useful and cool in my opinion.

2 Likes

And a technique using DataFramesMeta.jl is:

julia> using DataFramesMeta

julia> @rsubset(df, :name == "a")
1×2 DataFrame
 Row │ name     value
     │ String?  Int64
─────┼────────────────
   1 │ a            1
1 Like

also:

julia> df[isequal.(df.name, "a"), :]
1×2 DataFrame
 Row │ name     value
     │ String?  Int64
─────┼────────────────
   1 │ a            1
2 Likes

Thank you for all the answers, I don’t know how to mark all of them as a solution!

using DataFrames
df = DataFrame(name=["a", "b", missing,"a"], value=[1,2,3,4])

dfm=dropmissing(df,:name)

dfm[dfm.name .== "a", :]
df[.===("a",df.name),:]
2 Likes

I have some macros to make this easier in MissingsAsFalse.jl.

1 Like

The original failure:

df[df.name .== "a", :]

has a close solution, as per @rocco_sprmnt21:

df[df.name .=== "a", :]