In python, we can do:
import pyarrow.parquet as pq
filter = ('some_column', '=', 'some_value')
pq.read_table(file_path, filters=[filter]).to_pandas()
How can I apply a filter similarly when reading a parquet file in Julia?
I’ve tried:
using Parquet
filter = row -> row.some_column == "some_value"
Parquet.read_parquet(filepath, filter=filter)
But I get an error saying that there is no such method:
Closest candidates are:
Parquet.Table(::Any, ::Parquet.File, ::Tables.Schema; rows, batchsize, column_generator, use_threads) got unsupported keyword argument “filter”
But, according to the source of the Parquet package, the filter
option should be supported… it says:
filter
: Filter function to apply while loading only a subset of partitions from a dataset. The path to the partition is provided as a parameter.
The only thing that I’ve got working is:
using Parquet2
return Parquet2.Dataset(file_path) |> TableOperations.filter(r -> Tables.getcolumn(r, :some_column) == some_value) |> DataFrames.DataFrame
But this is much slower than the Python solution.