Bug with filtering DataFrames in combination with ShiftedArrays

  1. Good factcheck, Base.mightalias is an internal function because it only checks for some Base types, falling back to an often wrong false otherwise. It indeed doesn’t work on ShiftedArrays because it was never implemented for it. I misread its use in DataFrames, the actual alias check is just a much simpler check of whether deleting entries of the first column had reduced the length of the others:
    for i in 2:length(cols)
        col = cols[i]
        # this check is done to handle column aliases
        if length(col) == n
            deleteat!(col, drop)
        end
    end
  1. It would error if the columns weren’t aliasing each other because deleteat! is intentionally not implemented for ShiftedArrays, and it wouldn’t need to error in some very useful cases of aliasing (ShiftedArray(x) defaulting to 0 shift would be entirely safe). The “What’s the aliasing story” thread also had proponents of hard errors or automatic handling, but that unfairly punishes some entirely proper use cases or adds an uncertain heuristic every single call. Not wasting electricity on the same check in a hot loop justifies the existence of functions that don’t perform that check, and semantically ensuring unaliased inputs or intermediates (for example, like using the non-inplace filter for DataFrames) is much better than relying on an uncertain heuristic. That’s what made me take the side of “document this”, and the copycols keyword section doesn’t really say why there’s a true/false distinction. It’s easy to run into these issues if people overlook array aliasing (ShiftedArrays does openly say it’s lazy) or go overboard with preallocating memory.