Best Data Manipulation packages 2020-09 [video]

According to me, basically this. Obvious, they can’t deal with larger-than-RAM data but that’s another story.

DataFrames.jl (GitHub - JuliaData/DataFrames.jl: In-memory tabular data in Julia)

DataFramesMeta.jl (GitHub - JuliaData/DataFramesMeta.jl: Metaprogramming tools for DataFrames)

DataConvenience.jl (GitHub - xiaodaigh/DataConvenience.jl: Convenience functions missing in Julia)

Pipe.jl (https://github.com/oxinabox/Pipe.jl)

Lazy.jl (https://github.com/MikeInnes/Lazy.jl)

6 Likes

I usually just roll with DataFrames.jl + Pipe.jl for these kinds of things. Could you briefly explain what kind of functionality the additional packages provide that you’re missing from those two?

I mentioned Pipe.jl as I like Lazy’s better

@> df begin
   group(:grp)
   combine(:col1=>mean=>:mean_col1
end

vs

@pipe df |>
   group(_, :grp) |>
   combine(_, :col1=>mean=>:mean_col1)
end

but there is more typing.

But using Lazy is dangerous a it exports groupby which clashes with DataFrames.groupby.

So using DataConvenience is what I prefer as it only (re)exports @>. Pluls, it h as other convenience functions I like, like sampling a dataframe with sample(df, 0.05).

DataFramesMeta.jl can do things like

@transform(df, x = fn(:y)) instead of transform(df, :y => fn => :x)

BTW Lazy.jl and Pipe.jl has been dropped and replaced by Chain.jl

5 Likes

stuff here is outdated. See [Video] Best Julia packages for manipulating tabular (dataframe) data for latest recommendations