# Filter a number when missing values are in column

Hi i want filter the rows having perticular value, but the column contains missing values as well. can some body help me to do this
data frame i have

``````df = DataFrame()
df.a = ([3,5,6,0,missing])
df.b = 1:5
``````

i want to filter rows with value 5 in column df.a

``````df = filter(r -> r.a == 5 , df)
``````

this code gives me an non-boolean error

data frame i want is

``````df = DataFrame()
df.a = 5
df.b = 2
``````

Thanks

``````df = filter( r -> !ismissing(r.a) && r.a == 5 , df)
``````
2 Likes

For working with `missing` imho the best thing is generally to use `isequal`, because

``````julia> 5 == missing
missing

julia> isequal(5, missing)
false
``````

so

``````julia> df[isequal.(df.a, 5), :]
1×2 DataFrame
Row │ a       b
│ Int64?  Int64
─────┼───────────────
1 │      5      2
``````

(Incidentally, are you aware that you can assign columns directly in the DataFrame construtor, i.e. `DataFrame(a = [3, 5, 6, 0, missing], b = 1:5])`?

4 Likes

What’s the rationale behind that?

Doesn’t explain it but there is some logic inside:

``````julia> missing == missing
missing

julia> missing === missing
true
``````

The idea is that you don’t know - `missing` represents a value that’s missing in a statistical sense, i.e. the value exists but is not known (for example because it was not measured, someone didn’t answer that survey question etc.). You therefore can’t be sure that it’s not 5, so returning `false` wouldn’t be correct.

(There’s been a lot of ink spilled over this and how it can make working with DataFrames in Julia less ergonomic than some other popular languages, but it is what it is!)

4 Likes

Thanks, seems useful in many contexts. Of course it is then arbitrary that `isequal` returns `false`, but that allows one to choose what one wants, and probably that is the reason for the two alternatives.

Yes, at least that’s what the docstring suggests:

``````help?> isequal

isequal(x, y)

Similar to ==, except for the treatment of floating point numbers and of missing values.
isequal treats all floating-point NaN values as equal to each other, treats -0.0 as unequal
to 0.0, and missing as equal to missing. Always returns a Bool value.
``````
2 Likes

thanks @oheil @nilshg @lmiq for your help.

1 Like

how can remove only df.a = 5 row, keep the remianing data frame as such.

I wrote a package, MissingsAsFalse.jl which solves this exact problem via a macro.

``````@mfalse missing == 5
``````
1 Like

I would rather recommend (I ignore performance considerations):

``````filter(r -> coalesce(r.a == 5, false) , df)
``````

for the following reasons:

• by passing `true` you can decide to keep these rows
• it is more explicit what is going on
• `==` and `isequal` have a bit different behavior, so if you want `==` it is best to stick to using `==`
2 Likes

MissingsAsFalse.jl is of course a valid alternative. I was commenting on using `isequal`.

1 Like

Couple more tips

get a view of the dataframe with no missings in any column (no duplicating data)

``````julia> dfv  = dropmissing(df; view=true)
4×2 SubDataFrame
Row │ a       b
│ Int64?  Int64
─────┼───────────────
1 │      3      1
2 │      5      2
3 │      6      3
4 │      0      4
``````

or just particular columns

``````julia> dfv  = dropmissing(df, [:a]; view=true)
4×2 SubDataFrame
Row │ a       b
│ Int64?  Int64
─────┼───────────────
1 │      3      1
2 │      5      2
3 │      6      3
4 │      0      4
``````

specifying just the column in the filter is more efficient

``````julia> filter(:a=>a -> a == 5 , dfv)
1×2 DataFrame
Row │ a       b
│ Int64?  Int64
─────┼───────────────
1 │      5      2

``````

or multiples

``````julia> filter([:a, :b]=>(a,b) -> a >b, dfv)
3×2 DataFrame
Row │ a       b
│ Int64?  Int64
─────┼───────────────
1 │      3      1
2 │      5      2
3 │      6      3

``````

and finally, again you can filter to a view

``````julia> filter([:a]=>a -> a == 5 , dfv; view=true)
1×2 SubDataFrame
Row │ a       b
│ Int64?  Int64
─────┼───────────────
1 │      5      2

``````

how to remove only df.a = 5, so ill get a dataframe

``````df2 = DataFrame(a = [3,6,0,missing], b = [1,3,4,5] )
``````

I’m going to give an answer using DataFramesMeta.jl

``````@rsubset df @mfalse :a != 5
``````
1 Like

i have loaded DataFramesMeta , still if shows @mfalse not defined

``````
julia> using DataFramesMeta

julia> amik9d = @rsubset amik9 @mfalse :RATE != 0
ERROR: LoadError: UndefVarError: @mfalse not defined
``````

You also need to add the package MissingsAsFalse.jl as well. Though actually the `@rsubset` command will work even without `@mfalse`.

1 Like

it removed both missing and 5 from df.a, i wanted to remove only 5 and keep missing as such

Ah interesting! That would indeed happen and is a bit of a corner case I hadn’t thought of. You would have to specify this directly, unfortunately

``````@rsubset df @mfalse ismissing(:a) || :a != 5
``````

I will have to think about a better syntax to support this.

EDIT: This isn’t going to work either. You will have to use `if ... else` commands.

``````@rsubset df ismissing(:a) ? true : :a != 5
``````
1 Like