How well Apache Arrow’s zero copy methodology is supported?

@nilshg , sorry for the slow response; from your example, it does seem like some over-copying is going on when materializing the views on your arrow data. I’ll look into it.

@Sami , in your example, as long as there’s still a reference to the original at file, then that memory isn’t going to be freed/reclaimed. So I don’t see anything unexpected in your example. If you were to set at = nothing; GC.gc(); GC.gc() and the memory wasn’t freed, that would be a concern.

2 Likes

Thanks @quinnj . I understood let block wrongly before. I run code

using Tables
using Arrow
using TableOperations
print("start")
let at=Arrow.Table("big.arrow")
    let t= at |> TableOperations.select(:vendor_id, :passenger_count, :total_amount) |> TableOperations.filter(x -> x.total_amount > 10) |> Tables.columntable
        print("in")
    end
end
print("out")
GC.gc(); GC.gc();
print("done")

and memory becomes freed at the end.

Ah yes, that’s correct. It’s the variables in the let clause that will be temporary/i.e. live for the duration of the block.

This explanation helps in so many ways

It would be nice if TableOperations continued this philosophy. Many jobs start needing to saw up huge data sets and that tool doesn’t have to be the same one used for fine data frame transformations.