I was surprised that view
and @view
work differently when we work with subsets of the data.
The following does not modify the parent dataframe
df = DataFrame(a = [0, 0, 0, 0], b = [1, 1, -1, missing], c = [1, -1, missing, 1])
dfv = dropmissing(df, [:b,:c], view = true)
dfv = view(dfv[dfv.b.>0, :], :, :)
dfv[1, 1] = 100 #it does not change df
display(df)
4×3 DataFrame
Row │ a b c
│ Int64 Int64? Int64?
─────┼─────────────────────────
1 │ 0 1 1
2 │ 0 1 -1
3 │ 0 -1 missing
4 │ 0 missing 1
But the following does change the parent DataFrame
df = DataFrame(a = [0, 0, 0, 0], b = [1, 1, -1, missing], c = [1, -1, missing, 1])
dfv = dropmissing(df, [:b,:c], view = true)
dfv = @view dfv[dfv.b.>0, :]
dfv[1, 1] = 100 #it changes df
display(df)
4×3 DataFrame
Row │ a b c
│ Int64 Int64? Int64?
─────┼─────────────────────────
1 │ 100 1 1
2 │ 0 1 -1
3 │ 0 -1 missing
4 │ 0 missing 1
I know that this could be done in other ways (e.g., with subset
and skipmissing=true
), but I was wondering if this is expected behavior.