Wondering which one is better for data wrangling in general? I want to do some data science things e.g. wrangle, group, aggregate…
both ok, but which one is better in long run?
I’m partial because I maintain DataFramesMeta and contribute to DataFrames. But DataFramesMeta + DataFrames is a nice user experience imo. And don’t worry about performance from a typed object, DataFrames takes care of all that for you.
I am using stata for most of my job, and closer to stata is always better. (stata will be my main software)
I also couldn’t figure out how to unstack a typed table?
I am a heavy stata user in my day job. I won’t say that DataFramesMeta is designed to be closer to Stata, it’s much more inspired by dplyr, but I am very aware of what Stata does well and hope to implement all of it’s best features. But work though this tutorial and you will see similarities with Stata
in stata I am very used to _n syntax, is it possible to do the same in Meta.jl?
btw I wonder if a subset of
DataFramesMeta can be extended to Tables.jl. For “our” use, I’d love to have whatever that is fast
byrow available for TypedTable because Meta’s syntax is nice and looping over rows of TypedTable should be fastest by construction
I mean, in DataFramesMeta things are just arrays, so you can do
for i in eachindex(:variable_name) and work with it (even inside grouping operations). There are also the functions
lag from ShiftedArrays.jl.
_n is used to fill in missing values etc. There are better idioms for that in Julia.
if you work with TypedTables for an hour you will see it is not suitable for most of data wrangling stuffs (and it is slow despite typed stability)
I am also opinionated here, but my answer would be:
- If you do not know what to use then use DataFrames.jl as it is designed as an introductory level package that handles most of the issues users might encounter internally without exposing them.
- TypedTables.jl is great but requires much more knowledge about Julia to use it efficiently.