Counting Occurrences in JuliaDB

juliadb
#1

I’m trying to count the number of occurrences of each value in a column in a JuliaDB indexed table, but I cannot figure out how to do so. In a Data Frame, I would simply do:

sort!(by(dataframe, :column_to_target, nrow), :x1, rev=true)

This line goes through the column :column_to_target in the Data Frame dataframe and produces a new Data Frame that has each unique value in :column_to_target along with the number of times it occurs, expressed in descending order in column :x1.

How can I do this in JuliaDB? I’m assuming it can be done with the reduce method but I’ve been unable to figure it out.

#2

…so I’ve halfway answered my question (I guess I should have kept grinding for 15 more minutes before deciding to post :stuck_out_tongue_winking_eye:). I was able to get the counts of the values by using OnlineStats.jl likes this:

reduce(CountMap(), indexed_table, select = :column_to_target)

The problem with this, though, is that the output is a Dict which means I now have to jump through a couple of hoops to get the data sorted by values in descending order. There has to be a one-liner that can achieve this as can be done when working with Data Frames.

#3

You have some options if you want the returned value to be an IndexedTable:

julia> t = table(rand(1:10, 10^6), randn(10^6));

julia> @time groupby(length, t, 1; select=2)
  0.124609 seconds (3.90 M allocations: 74.765 MiB, 5.46% gc time)
Table with 10 rows, 2 columns:
1   2
──────────
1   99961
2   99655
3   100019
4   100234
5   99656
6   100052
7   100308
8   100623
9   100012
10  99480

julia> @time groupreduce(OnlineStatsBase.Counter(), t, 1;select=2)
  0.008767 seconds (222 allocations: 7.640 MiB)
Table with 10 rows, 2 columns:
1   2
────────────────────────────────────
1   Counter: n=99961 | value=99961
2   Counter: n=99655 | value=99655
3   Counter: n=100019 | value=100019
4   Counter: n=100234 | value=100234
5   Counter: n=99656 | value=99656
6   Counter: n=100052 | value=100052
7   Counter: n=100308 | value=100308
8   Counter: n=100623 | value=100623
9   Counter: n=100012 | value=100012
10  Counter: n=99480 | value=99480

julia> select(ans, (1,2=>value))
Table with 10 rows, 2 columns:
1   2
──────────
1   99961
2   99655
3   100019
4   100234
5   99656
6   100052
7   100308
8   100623
9   100012
10  99480
2 Likes
#4

Awesome, thank you!! For future readers, my final solution is here:

sort(groupby(length, indexed_table, :column_to_target), :length, rev = true)