How to efficiently get the DataFrame row index of statistical operation result

Given a DataFrame like this:

df = DataFrame(A = 10*rand(10), B = 10*rand(10))
sort!(df, :A)
q1 = percentile(df.A, 25)

I would like to get the corresponding value in column B of the row such that row.A is the A value immediately greater than q1.

For example, in the sample below I’d like to get 2.81803

10×2 DataFrame
 Row │ a          b
     │ Float64    Float64
─────┼────────────────────
   1 │ 0.0734051  3.37658
   2 │ 2.54486    8.046
   3 │ 2.93121    1.538
   4 │ 3.17508    **2.81803**
   5 │ 3.6127     3.09565
   6 │ 3.83031    8.45346
   7 │ 4.73316    5.52375
   8 │ 7.01111    1.80292
   9 │ 8.60424    7.87036
  10 │ 8.68761    8.0496

I’d like to make this efficient and possibly avoid loops.

You can use df.B[searchsortedfirst(df.A, q1)].

I don’t know if there’s an implementation of percentile or quantile for sorted arrays that also returns the index…

1 Like