Applying function to DataFrame with missing values in columns?

Let say I’ve got a data frame like this:

df = DataFrame(:x => [1 ,3, missing], :y => ["Some ascii text" ,"Some chinese text 恒基",missing])

In simple cases, map seems to do what I want and just ignores/returns the missing values, for example the following doesn’t error.

g(x) = x* 10
map(g,df[:x])

Union{Missing, Int64}[10, 30, missing]


However in other cases it doesn’t seem to work, like this will chuck an error.

replace_non_ascii(c) = isascii(c)  ? c : ' '
f(x) = map(replace_non_ascii,x)
map(f,df[:y])

I can get around it by defining the function again f(x::Missing) = missing, but was wondering if theres is a nicer way?

Currently, there is a broad set of functions pre-defined to handle missing values. As you’ve seen though, there are also functions that are missing “unaware”. There’s a proposal here to add a generic “lifting” function that would allow any function to propagate missing. The idea is that we can try out that functionality in a package and potentially add support for that to the language itself in the form of an operator or something.

3 Likes

You can use Query.jl for this:

df |> @map({y=f.(_.y)}) |> DataFrame

Note how the f function here is called with the . broadcasting syntax: that automatically lifts the function and makes it deal with NA values, even if f doesn’t know about NA values at all.

1 Like

Yeah I think having language level support for this kind of thing would be ideal, until then I’ll try make use of Query.jl. Cheers guys