Query.jl - User-set missing values in data frames not removed by @dropna

danvinci · January 28, 2021, 2:10am

Hi everyone!

I’m new to Julia and trying to learn how to use the Query package with data frames, and this question/issue is specific to it.

I’m running into the following issue:
after importing some CSV data, I need to get rid of non-usable values that are either ‘nothing’ or some strange value that I need to @mutate into something else before deciding if I want to drop it.

Here’s an example:

df_ex1 = DataFrame(names = ["name 1", "name 2", nothing, "name 4"])

df_ex1_pipe =  df_ex1 |> 
		@mutate(names = if isnothing(_.names) missing; else _.names; end) |> 
		@dropna(:names) |>
		DataFrame

I’m running the code in Pluto, and what I see printed is the 3rd value as ‘missing’ but not dropped as I’d expect. The result is as the original with nothing turned into missing but not dropped.

I also tried this other way:

df_ex2 = DataFrame(names = ["name 1", "name 2", nothing, "name 4"])
	
df_ex_pipe =  df_ex2 |> 
		@mutate(names = replace([_.names], nothing => missing)[1]) |> 
		@dropna(:names) |>
	        DataFrame

with the same result.

Note: here I’m creating a 1-element vector for using replace and getting back the element again.

Am I doing something totally off?

I tried dropmissing! on the df as modified by the first pipe and it works as expected.

Bonus issue:

When running a variant of the 1st implementation without the ‘else’ command, all values for which the check is false are turned to ‘nothing’, instead of being left as they were. Is this by-design or an unintended side-effect?

Finally, mandatory thank you to the package creator: I’m having fun learning how to use the package and seeing existing code turning into a very elegant series of pipes!

(Sometimes is not clear when to pass arguments as :col_name, col_name, “col_name”, or _.col_name - I’d appreciate any suggestions beyond the official documentation)

Thanks

quinnj · January 28, 2021, 4:42am

Ah, you’ve run into the unfortunate case that Query uses a non-standard representation for missing values. I believe you need to replace with NA, but I can’t quite remember the exact usage details.

danvinci · January 28, 2021, 2:13pm

Hi @quinnj !

That did the trick indeed, thanks for pointing that out.

Topic		Replies	Views
Query.jl with filtering by missing values doesn't seem to work? General Usage	8	1652	January 23, 2018
Query.jl - filtering on missing data Data	7	1557	September 21, 2018
How to drop NA values with Query.jl? Data data , query	2	989	November 21, 2017
Query - missing vs. isna General Usage query	7	1565	May 22, 2020
Basic function usage in Query @filter General Usage query , queryverse	9	215	May 23, 2024

Query.jl - User-set missing values in data frames not removed by @dropna

Related topics