Best Data Manipulation packages 2020-09 [video]

According to me, basically this. Obvious, they can’t deal with larger-than-RAM data but that’s another story.

DataFrames.jl (https://github.com/JuliaData/DataFrames.jl)

DataFramesMeta.jl (https://github.com/JuliaData/DataFramesMeta.jl)

DataConvenience.jl (https://github.com/xiaodaigh/DataConvenience.jl)

Pipe.jl (https://github.com/oxinabox/Pipe.jl)

Lazy.jl (https://github.com/MikeInnes/Lazy.jl)

4 Likes

I usually just roll with DataFrames.jl + Pipe.jl for these kinds of things. Could you briefly explain what kind of functionality the additional packages provide that you’re missing from those two?

I mentioned Pipe.jl as I like Lazy’s better

@> df begin
   group(:grp)
   combine(:col1=>mean=>:mean_col1
end

vs

@pipe df |>
   group(_, :grp) |>
   combine(_, :col1=>mean=>:mean_col1)
end

but there is more typing.

But using Lazy is dangerous a it exports groupby which clashes with DataFrames.groupby.

So using DataConvenience is what I prefer as it only (re)exports @>. Pluls, it h as other convenience functions I like, like sampling a dataframe with sample(df, 0.05).

DataFramesMeta.jl can do things like

@transform(df, x = fn(:y)) instead of transform(df, :y => fn => :x)