Row-wise quantile

If you have enough RAM most likely moving the data to a Matrix and then doing rowwise quantiles will be most efficient (I give an example using standard functions, specialized packages will be faster).
If you want to do it in DataFrames.jl here is the way to do it:

julia> using DataFrames, Statistics

julia> df = DataFrame(rand(100, 10_000), :auto);

julia> @time transform(df, AsTable(All()) => ByRow((t -> quantile(t, 0.15))∘collect) => :q_15);
  0.093942 seconds (242.68 k allocations: 35.970 MiB, 6.62% gc time, 61.12% compilation time)

which is slower but not by much in comparison to:

julia> @time quantile.(eachrow(Matrix(df)), 0.15);
  0.016732 seconds (10.21 k allocations: 15.497 MiB)

(all timings are after compilation)

Note that the crucial part is ∘collect. DataFrames.jl detects such composition and avoids excessive compilation in such cases.

2 Likes