we have accumulated over the years a lot of curated tutorials. I am really convinced, after going through them while updating, that if someone carefully studies them it is sufficient to confidently work with DataFrames.jl.
PrettyTables.jl in HTML backend works really nice; thank you @Ronis_BR for working on it (I have opened some issues related to things I have noticed when going through loads of outputs that can be used as ideas for further improvements).
You can find a list of all changes since 1.4.4 here and a summary of most important additions in NEWS.md.
Here let me briefly summarize most important things that will affect almost everyone using DataFrames.jl:
DataFrames.jl is Julia 1.9 ready; we have improved precompilation so that things will be more snappy;
groupby now fully supports all kind of sorting options that allow for specifying the resulting group order;
joining functions now support order keyword argument allowing the user to specify the order of the rows in the produced table (this is a big long time requested convenience feature);
Improved Cols column selector (allowing for performing of any set operation of passed arguments and allowing for passing multiple predicate functions that perform column selection).
The precompilation support in DataFrames.jl has two modes:
full precompilation;
no precompilation.
The default is full precompilation. In this mode the package should precompile in around 50 seconds and then its load time should be around 1.8 seconds. The benefit of full precompilation is that later commonly used functions do not need to be compiled so that you will have a more responsive experience.
The no precompilation mode disables precompilation. Then the package precompiles in around 5 seconds, and its load time is under 1 second. The downside is that later every function needs to be compiled when it is used.
To give you a flavor of the difference, the following example code:
DataFrames.jl 1.6.0 has just been released (so it can be field tested by users before JuliaCon2023 ).
This release focused mostly on code cleanup, improving API consistency, and integration issues. You can find the list of user-visible changes here and of all changes here.
I want to highlight three changes (the first two are things that are likely to be often used in daily work with DataFrames.jl; the third potentially could break some existing code - this is unlikely, but users should be aware of the risk):
Improvement of the convenience of using the Not selector: it now allows passing multiple positional arguments that are treated as if they were wrapped in Cols and does not throw an error when a vector of duplicate indices is passed when making column selection
DataFrame constructor now allows passing column names that replace the names generated by default
All Tables.AbstractRow subtypes are now treated in the same way as DataFrameRow in all operations; this could be minimally breaking in case users relied on Tables.AbstractRow to be treated as a scalar by combine in the past (the change follows the requests that treating Tables.AbstractRow as a scalar is on a border of being a bug)
The list of functionalities planned for 1.7 release can be found here 1.7 Milestone · GitHub.
This write up about DataFrames.jl in the JOSS journal is outstanding. The detailed discussion about the design choices is very informative. I have been using DataFrames.jl for many years, but this article adds new perspective on various nuances in the package.