Broadcasting: Difference between = and .= for DataFrames

Hi,
I wanted to change some rows of a DataFrame with a similar code to the following:

df = DataFrame([1:5,6:10])
df[df[:x1] .< 3 ,:x2] .= 99
julia> df
5×2 DataFrame
│ Row │ x1 │ x2 │
├─────┼────┼────┤
│ 1   │ 1  │ 6  │
│ 2   │ 2  │ 7  │
│ 3   │ 3  │ 8  │
│ 4   │ 4  │ 9  │
│ 5   │ 5  │ 10 │

But this did not change the value of the DataFrame. I had to remove the broadcasting operator for the assignment to change it.

df[df[:x1] .< 3 ,:x2] = 99
julia> df
5×2 DataFrame
│ Row │ x1 │ x2 │
├─────┼────┼────┤
│ 1   │ 1  │ 99 │
│ 2   │ 2  │ 99 │
│ 3   │ 3  │ 8  │
│ 4   │ 4  │ 9  │
│ 5   │ 5  │ 10 │

Why does the addition of the broadcast operator change the behavior? Because for Arrays, only the broadcasted assignments is not deprecated?

Thanks.

df[indices] .= y effectively calls broadcast!(identity, df[indices], y): that is, the broadcast operation (.=) acts on df[indices], which is a copy of the dataframes data rather than the original data.

Technically, the way to fix this is to define a Base.dotview function for DataFrame.

3 Likes

I’d vote that this should probably behave as @uwechsler originally expected. Opened issue.

Kind of similar, because DataFramesMeta works with keyword arguments, there are some inconsistencies with the way .= works.

 @transform(df, x1 .= 5)

and

 @transform(df, x1 = 5)

have the same behavior in DataFramesMeta.

I might even go as far as to say that df[idx, n] .= x shoulld work but df[idx, n] = x should throw an error. That would be consistent with the behavior of AbstractArray.

5 Likes