Packages for DataFrame manipulation/query

data
dataframes

#1

Hi all, newcomer here!

I know there was a lot of discussion on the past over DataFrames and where it’s headed, but my question is about the current packages for dataframe manipulation.

I’ve come across both DataFramesMeta.jl and Query.jl. I’d like to know what are the differences between both and if there are more mature packages for the same end. :slight_smile:

Any help and guidance is appreciated. :smile:


#2

You have the right link but the wrong name for DataFramesMeta. There is also JuliaDBMeta.

I’m writing this on a phone so I will be brief. A fundamental difference between the different approaches is representation of missing data. The approach favored in v0.7 is to use union types with Missing.missing which would lead you to using DataFramesMeta. On the other hand the"Queryverse" packages are more fully featured and mesh together well. JuliaDB and its query package allows for distributed processing and is backed by some high powered support at Julia Computing. All three have advantages.


#3

Oops! Thanks for the correction!

I’ll checkout the JuliaDB as well.

Is there any big performances differences between both then? I found the documentation a bit more lacking on the DataFramesMeta, but I’m feeling adventurous enough to invest time in it. :slight_smile:

What is this Queryverse? Is there a resource to check these packages out?


#4

As I understand it, Query is a set of tools meant to work with lots of tabular data structures, including DataFrames, IterTables and such, that offer a certain API. There’s a great talk on the library here.

DataFramesMeta is a set of meta programming tools specifically for DataFrames.

Queryverse is a meta-package – you load it (using Queryverse) and it loads a whole bunch of useful packages, like HDF5, Feather, Query, DataFrames, etc. etc.


#5

As an addendum: those are the differences, but there’s also lots of overlap. Julia took a “let a hundred flowers bloom” approach to the package ecosystem. That led to a lot of interesting experimentation, but it also means there are often several packages that do similar things in similar ways. I think Query and DataFramesMeta is a case of that.


#6

This ecosystem is definitely a different approach from what I’m used too, and a lot of the packages are really new with no documented benchmarks or user guides.

Hopefully I can find my way of navigating through it.

Thanks a lot of all the help!