I’m excited to announce a new release of DataFramesMeta.jl.
This feature adds one major new feature, which is the ability to use
AsTable on the right-hand-side of transformations. This makes it easy to work with many columns at once programatically. In particular, it allows one to emulate Stata’s
julia> using DataFramesMeta, Statistics; julia> df = DataFrame(rand(10, 100), :auto); # A wide data frame julia> @rselect df :row_mean = mean(AsTable(:)) 10×1 DataFrame Row │ row_mean │ Float64 ─────┼────────── 1 │ 0.495727 2 │ 0.478012 3 │ 0.449286 4 │ 0.457304 5 │ 0.508363 6 │ 0.470989 7 │ 0.49183 8 │ 0.450141 9 │ 0.489021 10 │ 0.484617
AsTable works just the same as
AsTablein DataFrames.jl. Behind the scenes in a
transform call, we pass a
NamedTuple of vectors (or in the row-wise case, a plain old
NamedTuple) to the underlying function.
In the above example, it would appear I the mean of a
NamedTuple, which normally carries with it large compilation costs. But thanks to great work by @bkamins and @nalimilan, DataFrames.jl uses a faster path which never materializes the named tuples, see #2869 for more details. Thanks to Julia’s modularity, DataFramesMeta.jl benefits from this excellent work.
In the future, I plan to add a
@collect macro-flag (similar to
@byrow) to let end-users take advantage of this fast path themselves.
There are additional compilation improvements in this release. For example,
:y = f(g(:x)) used to expand to an anonymous function, meaning the same transformation twice in separate places would incur a compilation cost each time. Now, howver,
:y = f(g(:x)) get’s expanded to
:y = (f ∘ g)(:x), whose compilation is re-used.
This will make DataFramesMeta.jl feel more snappy with long
@chains of operations.
julia> using DataFrames, DataFramesMeta julia> df = DataFrame(x = [1, 2]); julia> function inner(x) t = (x .- mean(x) .+ std(x)) .^2 t ./ t end; julia> function outer(x) @. (x + 1) * 100 + 60 end; julia> @select df :y = outer(inner(:x)); # TTFP compilation
julia> @time @select df :y = outer(inner(:x)); # First try 0.024464 seconds (17.83 k allocations: 1.045 MiB, 97.90% compilation time) julia> @time @select df :y = outer(inner(:x)); # Second try 0.025492 seconds (17.82 k allocations: 1.041 MiB, 97.88% compilation time)
julia> @select df :y = outer(inner(:x)); # TTFP compilation julia> @time @select df :y = outer(inner(:x)); # First try 0.000110 seconds (121 allocations: 6.531 KiB) julia> @time @select df :y = outer(inner(:x)); # Second try 0.000107 seconds (121 allocations: 6.531 KiB)
And of course plenty of docs fixes. Thank you to everyone who helped out!
See the News.md here.
In the next release we will add:
@collectmacro-flag for even faster row-wise operations
- Keyword arguments. I’ve been procrastinating finishing up the PR for it. If you want to help, the PR is here. Please let me know if you would like to assist!
I think we are getting closer to a 1.0 release.