Hi, I would be grateful for some advice on how to create a column containing the ordinal rank of a score grouped by another column. For example:
using JuliaDB
using StatsBase
Nx, Ny = 4, 3
x = repeat(1:Nx,inner=Ny) # location
y = repeat(1:Ny,outer=Nx) # date
z = rand(Nx*Ny) # score
t = table((x=x,y=y,z=z))
trk = setcol(t,:rk, (:z,:x) => row -> begin
t2 = filter(i->i.x == row.x,t)
X = select(t2,:z)
idx = findfirst(x->x==row.z,X)
rk = ordinalrank(X,rev=true)[idx]
rk
end
)
I would like to perform something like the above example on 1-5 million row tables, however it does not scale so well. My initial queries are:
- Is there a more sensible implementation?
- Should I parallelize it as per Parallel Computing · The Julia Language?
- Or is it better to let JuliaDB handle parallelization by itself?
Thanks!