@nilshg , sorry for the slow response; from your example, it does seem like some over-copying is going on when materializing the views on your arrow data. I’ll look into it.
@Sami , in your example, as long as there’s still a reference to the original at file, then that memory isn’t going to be freed/reclaimed. So I don’t see anything unexpected in your example. If you were to set at = nothing; GC.gc(); GC.gc() and the memory wasn’t freed, that would be a concern.
Thanks @quinnj . I understood let block wrongly before. I run code
using Tables
using Arrow
using TableOperations
print("start")
let at=Arrow.Table("big.arrow")
let t= at |> TableOperations.select(:vendor_id, :passenger_count, :total_amount) |> TableOperations.filter(x -> x.total_amount > 10) |> Tables.columntable
print("in")
end
end
print("out")
GC.gc(); GC.gc();
print("done")
It would be nice if TableOperations continued this philosophy. Many jobs start needing to saw up huge data sets and that tool doesn’t have to be the same one used for fine data frame transformations.