Hello (first post on this forum)! I am new to Julia, transitioning from R. I am trying to implement a split-apply-combine strategy on a matrix without converting it into a DataFrame
(reason: I need to do this operation for each iteration of a Monte Carlo, and I need to switch between arrays and their sparse matrix representation, hence I want to minimize type conversion, not sure if this makes sense).
The split-apply-combine would work as follows: take a matrix of three columns, split it by unique combinations of the values in the first two columns (which together, define the groups), and then add a fourth column that equals one if a value in the third column is the maximum in the corresponding group. Here is the comprehension I am working on: the dimension of the array A has approximately the same size as the examples from my data.
julia> using SplitApplyCombine, Random
julia> a, b = [150, 1000] # These can be made smaller for testing purposes
julia> A = hcat(repeat(1:a, outer = b), repeat(1:a, inner = b), randn(a*b))
150000Ă3 Matrix{Float64}:
1.0 1.0 -0.837408
2.0 1.0 0.793393
3.0 1.0 0.297672
4.0 1.0 -0.142228
5.0 1.0 0.606502
6.0 1.0 0.573417
âŽ
145.0 150.0 -0.474295
146.0 150.0 0.816623
147.0 150.0 -2.17863
148.0 150.0 0.455542
149.0 150.0 -0.868958
150.0 150.0 0.519702
julia> A = [argmax(J[findall(J[:, 1] .== b[1] .&& J[:, 2] .== b[2]), 3]) for b â unique(splitdims(A[:, [1,2]], 1))]
The idea is to then use the âargmaxâ indices to then construct the binary column that I want. However, the last line takes about one minute on my laptop, which I take as an indication that I am doing something wrong as Julia is not supposed to be this slow for a loop on a relatively low dimension. Also I think my syntax is overly intricate, another indication that something is wrong.
I looked for threads with similar issues but could not find a suitable strategy that would not resort to Data Frames. Thanks for helping!!!