Release announcements for DataFrames.jl

All tutorials referenced in Introduction · DataFrames.jl have been updated to DataFrames.jl 1.4.0.

Some conclusions from the process:

  • we have accumulated over the years a lot of curated tutorials. I am really convinced, after going through them while updating, that if someone carefully studies them it is sufficient to confidently work with DataFrames.jl.
  • the updating process mostly required adding new functionalities and fixing broadcasting assignment rule explanation (PR make broadcasting assignment consistent with ! by bkamins · Pull Request #3022 · JuliaData/DataFrames.jl · GitHub); other than that it was smooth (which is a good sign :smile:).
  • PrettyTables.jl in HTML backend works really nice; thank you @Ronis_BR for working on it (I have opened some issues related to things I have noticed when going through loads of outputs that can be used as ideas for further improvements).
12 Likes

Minor typos in the documentation (the examples are not displayed correctly, ```jldoctest was missing)

https://dataframes.juliadata.org/stable/lib/functions/#Base.invpermute!
https://dataframes.juliadata.org/stable/lib/functions/#Base.permute!
https://dataframes.juliadata.org/stable/lib/functions/#Random.shuffle
https://dataframes.juliadata.org/stable/lib/functions/#Random.shuffle!

I corrected them, but don’t know how modifications to documentations work. So, let me know if everything is fine.

Thanks for your hard work!!

1 Like

5 posts were split to a new topic: Asof join support in DataFrames.jl

DataFrames.jl 1.5.0 is out.

You can find a list of all changes since 1.4.4 here and a summary of most important additions in NEWS.md.

Here let me briefly summarize most important things that will affect almost everyone using DataFrames.jl:

  • DataFrames.jl is Julia 1.9 ready; we have improved precompilation so that things will be more snappy;
  • groupby now fully supports all kind of sorting options that allow for specifying the resulting group order;
  • joining functions now support order keyword argument allowing the user to specify the order of the rows in the produced table (this is a big long time requested convenience feature);
  • Improved Cols column selector (allowing for performing of any set operation of passed arguments and allowing for passing multiple predicate functions that perform column selection).

The precompilation support in DataFrames.jl has two modes:

  • full precompilation;
  • no precompilation.

The default is full precompilation. In this mode the package should precompile in around 50 seconds and then its load time should be around 1.8 seconds. The benefit of full precompilation is that later commonly used functions do not need to be compiled so that you will have a more responsive experience.

The no precompilation mode disables precompilation. Then the package precompiles in around 5 seconds, and its load time is under 1 second. The downside is that later every function needs to be compiled when it is used.

To give you a flavor of the difference, the following example code:

using DataFrames
df = DataFrame(rand(5, 3), :auto)
combine(df, :x1 => sum)
combine(df, All() .=> minimum)
df.id = [1, 1, 2, 2, 2];
gdf = groupby(df, :id);
transform(gdf, AsTable(Cols(r"x")) => ByRow(sum))

runs in around 4.4 seconds without precompilation and 2.4 seconds with precompilation (note that timings include package load time).

The instructions how to turn on/off precompilation are given here. Note that this can be done on a per project environment basis.

41 Likes

4 posts were split to a new topic: The naming of allunique in DataFrames.jl

All tutorials listed in Introduction · DataFrames.jl are now updated to DataFrames.jl 1.5.0 and Julia 1.9.

9 Likes

DataFrames.jl 1.6.0 has just been released (so it can be field tested by users before JuliaCon2023 :smile:).

This release focused mostly on code cleanup, improving API consistency, and integration issues. You can find the list of user-visible changes here and of all changes here.

I want to highlight three changes (the first two are things that are likely to be often used in daily work with DataFrames.jl; the third potentially could break some existing code - this is unlikely, but users should be aware of the risk):

  • Improvement of the convenience of using the Not selector: it now allows passing multiple positional arguments that are treated as if they were wrapped in Cols and does not throw an error when a vector of duplicate indices is passed when making column selection
  • DataFrame constructor now allows passing column names that replace the names generated by default
  • All Tables.AbstractRow subtypes are now treated in the same way as DataFrameRow in all operations; this could be minimally breaking in case users relied on Tables.AbstractRow to be treated as a scalar by combine in the past (the change follows the requests that treating Tables.AbstractRow as a scalar is on a border of being a bug)

The list of functionalities planned for 1.7 release can be found here 1.7 Milestone · GitHub.

29 Likes
34 Likes

From strength to strength. Comprehensive write-up. Does Bogumil sleep?

3 Likes

This write up about DataFrames.jl in the JOSS journal is outstanding. The detailed discussion about the design choices is very informative. I have been using DataFrames.jl for many years, but this article adds new perspective on various nuances in the package.

Kudos to Bogumil and Milan.

5 Likes

I think he had a brief nap back in 2014, but not since then.

3 Likes