Segfault on DataFrames `getindex`

Definately an AbstractDataFrame,
I believe it is a SubDataFrame created by selecting 2 columns (:n and one other call it :d) from a larger dataframe. (There is a chance the the it was sliced from was itself a subdataframe, I’ld have to go looking.)

It has had a marginally unplesent life.
Here is a bit more of its life.

    disallowmissing!(dropmissing!(mydataframe))

    # Cause `disallowmissing!` doesn't seem to change the column type away from Any on 0.6?
    mydataframe[:d] = Vector{Float64}(mydataframe[:d])

    stds = [std(df[:d]) for df in groupby(mydataframe, :n)]
    for (i, df) in enumerate(groupby(mydataframe, :n))
        for foo in (1,2)
                x = df[1, :n]
                # other operations on `x` and `foo` that do not touch any dataframes
                @show 1 # Add this to stop  segufault.
         end
   end

At this stage it had to be a DataFrame as the first operation is unsupported for SubDataFrame and the second would not change eltype of column :d if mydataframe were a SubDataFrame.

And actually, given your comment, it would be also interesting to know if these two variants work or fail:

for (i, df) in enumerate(groupby(DataFrame(mydataframe), :n))
        for foo in (1,2)
                x = df[1, :n]
                # other operations on `x` and `foo` that do not touch any dataframes
                @show 1 # Add this to stop  segufault.
         end
end

and

for (i, df) in enumerate(groupby(DataFrame(eachcol(mydataframe, true)...), :n))
        for foo in (1,2)
                x = df[1, :n]
                # other operations on `x` and `foo` that do not touch any dataframes
                @show 1 # Add this to stop  segufault.
         end
end

To be clear, the question of the origin of the data frame is essential. If it’s really a SubDataFrame extracted from a previous groupby operation, then it’s likely that the bug has been fixed by #1709. That would explain why it’s gone in 0.17.1, and then everything would be fine. But as noted by @bkamins that’s not a SubDataFrame if you call disallowmissing! on it.

1 Like

@nalimilan This is exactly what I am trying to rule out, that is why I suggested to test what happens if we freshly create a DataFrame (the two approaches differ that one does copy of old index and the other creates it freshly - which also might make a difference).

A post was split to a new topic: Get on GroupedDataFrame