Query.jl - User-set missing values in data frames not removed by @dropna

Hi everyone!

I’m new to Julia and trying to learn how to use the Query package with data frames, and this question/issue is specific to it.

I’m running into the following issue:
after importing some CSV data, I need to get rid of non-usable values that are either ‘nothing’ or some strange value that I need to @mutate into something else before deciding if I want to drop it.

Here’s an example:

df_ex1 = DataFrame(names = ["name 1", "name 2", nothing, "name 4"])

df_ex1_pipe =  df_ex1 |> 
		@mutate(names = if isnothing(_.names) missing; else _.names; end) |> 
		@dropna(:names) |>
		DataFrame

I’m running the code in Pluto, and what I see printed is the 3rd value as ‘missing’ but not dropped as I’d expect. The result is as the original with nothing turned into missing but not dropped.

I also tried this other way:

df_ex2 = DataFrame(names = ["name 1", "name 2", nothing, "name 4"])
	
df_ex_pipe =  df_ex2 |> 
		@mutate(names = replace([_.names], nothing => missing)[1]) |> 
		@dropna(:names) |>
	        DataFrame

with the same result.

Note: here I’m creating a 1-element vector for using replace and getting back the element again.

Am I doing something totally off?

I tried dropmissing! on the df as modified by the first pipe and it works as expected.

Bonus issue:

When running a variant of the 1st implementation without the ‘else’ command, all values for which the check is false are turned to ‘nothing’, instead of being left as they were. Is this by-design or an unintended side-effect?

Finally, mandatory thank you to the package creator: I’m having fun learning how to use the package and seeing existing code turning into a very elegant series of pipes!

(Sometimes is not clear when to pass arguments as :col_name, col_name, “col_name”, or _.col_name - I’d appreciate any suggestions beyond the official documentation)

Thanks :slight_smile:

Ah, you’ve run into the unfortunate case that Query uses a non-standard representation for missing values. I believe you need to replace with NA, but I can’t quite remember the exact usage details.

Hi @quinnj !

That did the trick indeed, thanks for pointing that out.