Hi, I would be grateful for some advice on how to create a column containing the ordinal rank of a score grouped by another column. For example:
using JuliaDB using StatsBase Nx, Ny = 4, 3 x = repeat(1:Nx,inner=Ny) # location y = repeat(1:Ny,outer=Nx) # date z = rand(Nx*Ny) # score t = table((x=x,y=y,z=z)) trk = setcol(t,:rk, (:z,:x) => row -> begin t2 = filter(i->i.x == row.x,t) X = select(t2,:z) idx = findfirst(x->x==row.z,X) rk = ordinalrank(X,rev=true)[idx] rk end )
I would like to perform something like the above example on 1-5 million row tables, however it does not scale so well. My initial queries are:
- Is there a more sensible implementation?
- Should I parallelize it as per https://docs.julialang.org/en/v1/manual/parallel-computing/?
- Or is it better to let JuliaDB handle parallelization by itself?