Jake
1
I am trying to find a row in a dataframe column that has missing values as well. As an example the following code without the missing value works:
julia> df = DataFrame(name=["a", "b", "c"], value=[1,2,3])
3×2 DataFrame
Row │ name value
│ String Int64
─────┼───────────────
1 │ a 1
2 │ b 2
3 │ c 3
julia> df[df.name .== "a", :]
1×2 DataFrame
Row │ name value
│ String Int64
─────┼───────────────
1 │ a 1
julia>
However when a missing value is introduced in the column it no longer work as shown here:
julia> df = DataFrame(name=["a", "b", missing], value=[1,2,3])
3×2 DataFrame
Row │ name value
│ String? Int64
─────┼────────────────
1 │ a 1
2 │ b 2
3 │ missing 3
julia> df[df.name .== "a", :]
ERROR: ArgumentError: unable to check bounds for indices of type Missing
Stacktrace:
[1] checkindex(#unused#::Type{Bool}, inds::Base.OneTo{Int64}, i::Missing)
@ Base .\abstractarray.jl:725
[2] checkindex
@ .\abstractarray.jl:740 [inlined]
[3] getindex(df::DataFrame, row_inds::Vector{Union{Missing, Bool}}, #unused#::Colon)
@ DataFrames C:\Users\jakez\.julia\packages\DataFrames\dgZn3\src\dataframe\dataframe.jl:600
[4] top-level scope
@ REPL[34]:1
julia>
What is the technique to find the requested row?
julia> df = DataFrame(name=["a", "b", missing, "a"], value=[1,2,3,4])
4×2 DataFrame
Row │ name value
│ String? Int64
─────┼────────────────
1 │ a 1
2 │ b 2
3 │ missing 3
4 │ a 4
julia> named_a = findall(skipmissing(df.name)) do nm
nm == "a"
end
julia> named_a = findall(skipmissing(df.name)) do nm
nm == "a"
end
2-element Vector{Int64}:
1
4
julia> df[named_a, :]
2×2 DataFrame
Row │ name value
│ String? Int64
─────┼────────────────
1 │ a 1
2 │ a 4
Dan
3
Another option using the nifty coalesce
function:
julia> df[coalesce.(df.name .== "a",false),:]
2×2 DataFrame
Row │ name value
│ String? Int64
─────┼────────────────
1 │ a 1
2 │ a 4
The coalesce
function and its equivalent for replacing nothing
values, something(...)
are very useful and cool in my opinion.
2 Likes
And a technique using DataFramesMeta.jl is:
julia> using DataFramesMeta
julia> @rsubset(df, :name == "a")
1×2 DataFrame
Row │ name value
│ String? Int64
─────┼────────────────
1 │ a 1
1 Like
also:
julia> df[isequal.(df.name, "a"), :]
1×2 DataFrame
Row │ name value
│ String? Int64
─────┼────────────────
1 │ a 1
2 Likes
Jake
6
Thank you for all the answers, I don’t know how to mark all of them as a solution!
using DataFrames
df = DataFrame(name=["a", "b", missing,"a"], value=[1,2,3,4])
dfm=dropmissing(df,:name)
dfm[dfm.name .== "a", :]
df[.===("a",df.name),:]
2 Likes
I have some macros to make this easier in MissingsAsFalse.jl.
1 Like
The original failure:
df[df.name .== "a", :]
has a close solution, as per @rocco_sprmnt21:
df[df.name .=== "a", :]