Replace missing values with .= in DataFrame

The following is an MWE of a problem in which I wanted to replace some missing values for a specific set of observations in a DataFrame. It works with =, but not with the .=.

Is this a bug? If not, can someone please clarify what is going on?

using DataFrames, Missings
df = DataFrame(wage = [0.0, missing, 10.0, 20.0])
df[ismissing.(df[:wage]), :wage] .= 0 # note the . before the =
find(ismissing, df[:wage])            # 2, still there

Cf with arrays, .= works fine:

A = [0.0, missing, 10.0, 20.0]
A[ismissing.(A)] .= 0
A                               # OK

Versions:

julia> Pkg.status.(["DataFrames", "Missings"]);
 - DataFrames                    0.11.3
 - Missings                      0.2.4

julia> VERSION
v"0.6.2"

The reason is that on 0.6.2 for arrays .= calls dotview in broadcast.jl, which creates a view of LHS if it is an AbstractArray in your case. These is no such rule for DataFrame.

This can be fixed by adding a definition:

import Base.Broadcast.dotview
Base.@propagate_inbounds dotview(df::DataFrame, row, col::Union{Real,Symbol}) = view(df[col], row)

Note that this is a solution for 0.6.2 as in 0.7 the implementation of .= has changed.

2 Likes

As an additional comment you can write:

df[ismissing.(df[:wage]), :wage] = 0

to get what you want (but I understand this was not your question).

2 Likes