Bug with filtering DataFrames in combination with ShiftedArrays

It seems filtering DataFrames is not compatible with ShiftedArrays in a way one would expect it to be:

using DataFrames
using ShiftedArrays

issquare(n::Integer)=(n==round(sqrt(n))^2)

N=20
numbers=collect(1:N)
successors=ShiftedArrays.lead(numbers)
predecessors=ShiftedArrays.lag(numbers)

Test=DataFrame([numbers,successors,predecessors],[:n,:s,:p])
GoodFiltered=filter!( row->issquare(row.n), Test )

This gives the correct result, for example containing a row with n=9, s=10, p=8
However, when I slightly vary the code and replace the last two lines of code with

Test=DataFrame([numbers],[:n])
Test.s=ShiftedArrays.lead(Test.n)
Test.p=ShiftedArrays.lag(Test.n)
BadFiltered=filter!( row->issquare(row.n), Test )

This gives the wrong result, for example containing a row with n=9, s=16, p=4

It is very unexpected to me, there might be some explanation but it certainly feels like this should not be the way filter! works, especially since filter without ! works well as before.

Thanks for any help!

1 Like

It’s an issue with copycols behaviour. For example:

Test=DataFrame([numbers],[:n])
Test.p=copy(ShiftedArrays.lag(Test.n))
Test.s=copy(ShiftedArrays.lead(Test.n))
BadFiltered=filter!( row->issquare(row.n), Test )

works okay. The first version has a default copycols named parameter set to true. So, to get the error in the first case:

julia> Test=DataFrame([numbers,successors,predecessors],[:n,:s,:p]; 
  copycols=false)   # get bad behaviour without copy

julia> filter!( row->issquare(row.n), Test )
4Γ—3 DataFrame
 Row β”‚ n      s        p       
     β”‚ Int64  Int64?   Int64?  
─────┼─────────────────────────
   1 β”‚     1        4  missing 
   2 β”‚     4        9        1
   3 β”‚     9       16        4
   4 β”‚    16  missing        9

The bad behaviour is caused by modifying memory which is aliased by several vectors (efficient but dangerous in this case).

1 Like