File IO - Parquet File Reader


#1

Is there a Julia v1.0 supported parquet file reader that can be used to load into a DataFrame? I attempted to use the example on Queryverse.jl but it seems it requires ParquetFiles.jl. I get the following when I try to add the ParquetFile package.

(v1.0) pkg> add ParquetFiles
  Updating registry at `~/.julia/registries/General`
  Updating git-repo `https://github.com/JuliaRegistries/General.git`
 Resolving package versions...
ERROR: Unsatisfiable requirements detected for package ParquetFiles [46a55296]:
 ParquetFiles [46a55296] log:
 ├─possible versions are: 0.0.1 or uninstalled
 ├─restricted to versions * by an explicit requirement, leaving only versions 0.0.1
 └─restricted by julia compatibility requirements to versions: uninstalled — no versions left

#2

ParquetFiles.jl uses Parquet.jl under the hood, and the latter is not yet ported to Julia 1.0, I’m afraid. I’m not aware of any other parquet reader on Julia.


#3

I just tagged new releases of all Parquet related stuff, it should now all work on julia 1.0.


#4

Thanks so much for this @davidanthoff. I haven’t used this yet myself, but it’s definitely important that we have this functionality in Julia. Parquet’s are increasingly common (though, honestly, I still don’t understand what their intended use case is).


#5

I think the goal was a good mix between portability and query efficiency. This is very useful in data lakes.