# Row-wise quantile

How can I use the quantile! function row-wise on a dataframe?

df = DataFrame(A = 1:4, B = 5:8, C = 1:4, D = 5:8)

For example, I would like to add a column with the 0.15 percentile for each row.

Thank you for the help!

`AsTable` and `collect` are useful here

``````julia> using DataFramesMeta, Statistics

julia> transform(df, AsTable([:A, :B, :C]) => ByRow(t -> quantile(collect(t), .15)) => :q_15)
4×5 DataFrame
Row │ A      B      C      D      q_15
│ Int64  Int64  Int64  Int64  Float64
─────┼─────────────────────────────────────
1 │     1      5      1      5      1.0
2 │     2      6      2      6      2.0
3 │     3      7      3      7      3.0
4 │     4      8      4      8      4.0
``````

``````julia> @rtransform df :q_15 = quantile(collect(AsTable([:A, :B, :C])), .15)
4×5 DataFrame
Row │ A      B      C      D      q_15
│ Int64  Int64  Int64  Int64  Float64
─────┼─────────────────────────────────────
1 │     1      5      1      5      1.0
2 │     2      6      2      6      2.0
3 │     3      7      3      7      3.0
4 │     4      8      4      8      4.0
``````

EDIT: It looks like you don’t even need `collect` in there anywhere .

Thank you, Peter! This is helpful. However, is there a solution for the case where you have many columns? In the data I am working with, I have 10K columns where I want to calculate row-wise percentiles. Again, thank you for the help!

Can’t speak for dataframes but if you can get just a plain `Array` then VectorizedStatistics.jl (and NanStatistics.jl) have fast (after compilation) methods with dimension-wise quantiles and percentiles

``````julia> A = rand(1000, 10000);

julia> @time vquantile!(copy(A), 0.15, dims=2)
0.727348 seconds (3 allocations: 76.302 MiB)
1000×1 Matrix{Float64}:
0.15339182476244487
0.15116373206593528
0.14991189653752926
0.14992050611491373
0.1502873388769227
0.14873368062909753
⋮
0.14993164357947808
0.14337271376066515
0.15202934011278724
0.14750348034834604
0.15218943855833048

julia> @time vquantile!(copy(A), 0.15, dims=1)
0.131755 seconds (4 allocations: 76.370 MiB)
1×10000 Matrix{Float64}:
0.132576  0.13291  0.17419  0.165488  0.168598  0.146831  …  0.151196  0.15443  0.147143  0.146142  0.158048  0.139629
``````

The `copy` is because this is an in-place method for efficiency, which you definitely don’t want if you’re going to alternate between row-wise and column-wise percentiles but is great if you’re going to take many different percentiles along the same dimension successively

1 Like

How many rows do you have? In this instance it may be better to call `permutedims` on your data frame and operate by column.

The solution I give will not scale to 10K columns. You probably want a different data structure

If you have enough RAM most likely moving the data to a `Matrix` and then doing rowwise quantiles will be most efficient (I give an example using standard functions, specialized packages will be faster).
If you want to do it in DataFrames.jl here is the way to do it:

``````julia> using DataFrames, Statistics

julia> df = DataFrame(rand(100, 10_000), :auto);

julia> @time transform(df, AsTable(All()) => ByRow((t -> quantile(t, 0.15))∘collect) => :q_15);
0.093942 seconds (242.68 k allocations: 35.970 MiB, 6.62% gc time, 61.12% compilation time)
``````

which is slower but not by much in comparison to:

``````julia> @time quantile.(eachrow(Matrix(df)), 0.15);
0.016732 seconds (10.21 k allocations: 15.497 MiB)
``````

(all timings are after compilation)

Note that the crucial part is `∘collect`. DataFrames.jl detects such composition and avoids excessive compilation in such cases.

2 Likes

On larger data the difference is even smaller:

``````julia> df = DataFrame(rand(10_000, 10_000), :auto);

julia> @time quantile.(eachrow(Matrix(df)), 0.15);
1.954686 seconds (30.02 k allocations: 1.491 GiB, 1.86% gc time)

julia> @time select(df, AsTable(All()) => ByRow((t -> quantile(t, .15))∘collect) => :q_15);
2.081471 seconds (220.47 k allocations: 1.500 GiB, 7.30% gc time, 2.69% compilation time)
``````

(mostly GC is in play here as the DataFrames.jl method allocates more)

1 Like