How to remove columns from Dataframe

I have a parquet dataset with 32 million rows read using read_parquet
Complex processing gets completed in seconds.

But when I try to ignore some columns using
select!(data, Not([:column1, :column2]))

This does not complete even after a day. What is the best way to remove columns without copying the whole data frame. will try to come up with generic eg.

This does not copy data and should be very fast. It is surprising that you have a problem here. Can you please give a reproducible example?

Here is my try:

julia> df = DataFrame(rand(Bool, 32_000_000, 100), :auto);

julia> @time select!(df, Not([:x2, :x10]));
  0.000060 seconds (165 allocations: 18.430 KiB)

As you can see this operation was designed to be very lightweight and fast.

4 Likes