Update 12th Nov 2017
2nd Update Implemented a multi-threaded that is almost 2x faster than data.table!
Implemented a new radixgroup that is even faster than the previous method by 18% on 250 million element arrays.
Update 29th Oct 2017
I managed to repurpose Radixsort to make Julia’s groupby performance be on par with data.table and can beat it in some cases!
https://www.codementor.io/zhuojiadai/an-empirical-study-of-group-by-strategies-in-julia-dagnosell
Original post
I am preparing to release some new group by benchmarks of Julia vs R data.table group by performance. I am now investigating a number of strategies to compute the mean of a value by group.
I have contained my MWE in the gist below. I basically came up with 4 approaches, please do suggest improvements to the code as I am new to Julia.
I am trying to find the mean value of v1 by distinct values of id4. Normally v1 and id4 would be part a dataframe or table of some sort, but for now I have included them as vectors.
Update (2017-10-22)
Thanks to @nalimilan’s helpful tips and comments I managed to get a meany_best
approach = Approach
Approach 1 (meany_best
) - zip with Dict
In this approach I use one Dicts that maps to the values and the count and use Base internal ht_keyindex to access values in a Dict quickly. This approach is general and works for any data type so it superior to Approach 4. The only approach that still beats this is Approach 3 which requires large amounts of prior knowledge.
timing ~10 seconds
Approach 2 (meany2
) - enumerate with Dict
Similar to approach 1 instead of zipping the two I simply enumerate on one of them.
timing ~60 seconds
Approach 3 (meany3
) - what if I know the number unique values in id4 beforehand?
I can just use two arrays res
and wt
, enumerate through id4 and and keep adding to it and then summarise to get the mean
timing ~0.865 seconds
Approach 4 (meany4
) - what if I work out the unique values in id4 first?
I use countmap to get the unique values in id4
and then follow approach 3
timing ~18 seconds
some comments & questions
Clearly approach 3 only works in limited situations, but approach 4 is more generalisable. I understand approach 1 & 2 to be slow because there were lots of mem allocation there? How can I reduce the number of allocations?
Are there even faster methods I have missed? Multithreading?
https://gist.github.com/xiaodaigh/11e748609edc860a2b3896b27c7f5a7b#file-mwe