I’m trying to count the number of occurrences of each value in a column in a JuliaDB indexed table, but I cannot figure out how to do so. In a Data Frame, I would simply do:
sort!(by(dataframe, :column_to_target, nrow), :x1, rev=true)
This line goes through the column :column_to_target
in the Data Frame dataframe
and produces a new Data Frame that has each unique value in :column_to_target
along with the number of times it occurs, expressed in descending order in column :x1
.
How can I do this in JuliaDB? I’m assuming it can be done with the reduce
method but I’ve been unable to figure it out.
…so I’ve halfway answered my question (I guess I should have kept grinding for 15 more minutes before deciding to post ). I was able to get the counts of the values by using OnlineStats.jl likes this:
reduce(CountMap(), indexed_table, select = :column_to_target)
The problem with this, though, is that the output is a Dict which means I now have to jump through a couple of hoops to get the data sorted by values in descending order. There has to be a one-liner that can achieve this as can be done when working with Data Frames.
You have some options if you want the returned value to be an IndexedTable:
julia> t = table(rand(1:10, 10^6), randn(10^6));
julia> @time groupby(length, t, 1; select=2)
0.124609 seconds (3.90 M allocations: 74.765 MiB, 5.46% gc time)
Table with 10 rows, 2 columns:
1 2
──────────
1 99961
2 99655
3 100019
4 100234
5 99656
6 100052
7 100308
8 100623
9 100012
10 99480
julia> @time groupreduce(OnlineStatsBase.Counter(), t, 1;select=2)
0.008767 seconds (222 allocations: 7.640 MiB)
Table with 10 rows, 2 columns:
1 2
────────────────────────────────────
1 Counter: n=99961 | value=99961
2 Counter: n=99655 | value=99655
3 Counter: n=100019 | value=100019
4 Counter: n=100234 | value=100234
5 Counter: n=99656 | value=99656
6 Counter: n=100052 | value=100052
7 Counter: n=100308 | value=100308
8 Counter: n=100623 | value=100623
9 Counter: n=100012 | value=100012
10 Counter: n=99480 | value=99480
julia> select(ans, (1,2=>value))
Table with 10 rows, 2 columns:
1 2
──────────
1 99961
2 99655
3 100019
4 100234
5 99656
6 100052
7 100308
8 100623
9 100012
10 99480
2 Likes
Awesome, thank you!! For future readers, my final solution is here:
sort(groupby(length, indexed_table, :column_to_target), :length, rev = true)