Hey hey, what’s the easiest way to read a parquet file into an apache arrow dataset? In python one can simply do pyarrow.parquet.read_table(..)
. I know Parquet.jl and Arrow.jl exist, but I haven’t found a way to make the two work together.
My goal is to read the parquet file into a dataframe and perform some benchmarks comparing regular DataFrames vs DataFrames using Apache Arrow vecs.
The only way I see to convert Parquet files to a DataFrame is by using the RecordCursor, which doesn’t seem ideal because parquet is a columnar format and so are DataFrames and Apache Arrow - and having to iterate over rows is really slow