`filter` on a `DataFrame` is not allocating a new object as expected

Hi,

I just attempted the following and was surprised that my original table was modified. I would have expected filter to allocate a new DataFrame so that setdiff! would not have modified the original DataFrame. I also tried view=false (which is the default), but the original table is mutated. Is this a bug?

julia> gtable = DataFrame(A=[1,2,3],B=[[1],[2],[3]])
3×2 DataFrame
 Row │ A      B
     │ Int64  Array…
─────┼───────────────
   1 │     1  [1]
   2 │     2  [2]
   3 │     3  [3]

julia> b = filter(i -> i.A == 1, gtable).B[1]
1-element Vector{Int64}:
 1

julia> setdiff!(b,[1])
Int64[]

julia> gtable
3×2 DataFrame
 Row │ A      B
     │ Int64  Array…
─────┼────────────────
   1 │     1  Int64[]
   2 │     2  [2]
   3 │     3  [3]

That’s an interesting observation. DataFrames uses copy for these sorts of things rather than deepcopy. I wouldn’t call this a bug, necessarily, but it is something to look out for.

This is expected. Base Julia works in exactly the same way.

2 Likes

Thank you for your responses. @bkamins, I see that this is indeed the behavior in Base Julia:

julia> x=[[1,2,3]]
1-element Vector{Vector{Int64}}:
 [1, 2, 3]

julia> y=copy(x)
1-element Vector{Vector{Int64}}:
 [1, 2, 3]

julia> setdiff!(y[1],[2])
2-element Vector{Int64}:
 1
 3

julia> x
1-element Vector{Vector{Int64}}:
 [1, 3]

Should there be an option to fully allocate a new DataFrame using deepcopy instead of copy?

1 Like

I guess we could allow passing a function to copycols. But what’s the use case exactly?

I am using it in simulation and am storing an initial state in the dataframe that gets updated. I guess, we don’t necessarily need to implement a deepcopy option, but the docstring could be updated to indicate that the new dataframe that is allocated is done by using copy rather than deepcopy.

1 Like

Everywhere in the docstring where copycols is supported we write that we copy columns (not deepcopy).

If you think clarifying this would be helpful can you please make a PR to the manual (not to the documentation since we have very many places where the same issue is present) in a place where you think it would be most useful for you? Thank you!

2 Likes

Oh ok. I should check the manual then! Thanks @bkamins

1 Like