[ANN] RuleMiner.jl - Efficient frequent itemset and association rule mining

JaredSchwartz · June 29, 2024, 10:01pm

RuleMiner.jl

RuleMiner is a Julia package for association rule and frequent itemset mining inspired by the arules R package and SPMF Java library.

About Association Rule Mining

Association rule and frequent itemset mining are techniques dedicated to efficiently searching for and quantifying the concurrence of variables within observations (transactions) of data. They are most commonly used in retail applications like market-basket analysis, but they are also employed in other domains like web traffic analysis and bioinformatics.

Design Goals

The goal in designing this package was to build a tool for mining frequent itemsets in Julia that was easy to use and extremely fast at extracting the relevant patterns from the transactions dataset.

One major headache with mining tools in other languages is their unfriendly output formats that either use custom data structures or print directly into files. Thus, RuleMiner was designed to output directly into a DataFrames.jl DataFrame for better slicing, sorting and general portability.

The other key design goal was speed. All the algorithms are multi-threaded and optimized to mine the rules as fast as possible. The multi-threaded implementation of this package allows it to outperform the single-threaded C modules that underpin tools like the arules package in R. File reading has also been optimized to massively reduce the I/O time for reading transactional data files compared to implementations in Python and R.

Current Status

RuleMiner currently supports two algorithms: A Priori and ECLAT, with more algorithms planned for future releases. If you’d like to help me build out this package or report an issue, you can find the Github repository here.

If you have any suggestions or comments, I’d love to hear them!

quinnj · July 7, 2024, 12:28am

This is very cool; I’ve always wanted to do some great association rule algorithms in Julia; excited to see this work.

JaredSchwartz · July 20, 2024, 11:04pm

RuleMiner.jl 0.3.0 Release Notes

RuleMiner 0.3.0 is out! Here are the changes since 0.1.0 (when I last posted):

0.2.0

Added FP-Growth algorithm for frequent itemset mining
Implemented performance improvements to eclat and apriori

0.3.0

Added support for four closed itemset mining algorithms:
- FPClose
- CHARM
- LCM
- CARPENTER
Added LevelWise algorithm to recover frequent itemsets from closed itemsets
Moved from the threads macro to sync/spawn blocks in ECLAT, Apriori, and FP tree creation to better balance the often asymmetric workload across processing cores
Added an additional dependency on Combinatorics.jl

Future Plans:

0.4.0 is planned to include maximal itemset mining algorithms

Links: GitHub || JuliaHub

JaredSchwartz · August 11, 2024, 5:43am

RuleMiner.jl 0.4.0 Release Notes

RuleMiner 0.4.0 has been released! Here are the changes since 0.3.0:

0.3.1

Patched apriori to use the same Float/Int conversion method as other algorithms when handling relative min_support values.
Patched apriori to now use absolute support internally when evaluating candidate rules.

0.4.0

Added support for two maximal itemset mining algorithms:
- FPMax
- GenMax
Reworked internal FP Tree construction to use atomics instead of a lock-based multithreading approach, leading to significant speed gains and lower memory usage for all FP algorithms.
Reworked load_transactions to no longer rely on LuxurySparse.jl for COO-CSC sparse conversion and removed that dependency
Moved the function to convert a DataFrame into a Transactions object directly into the Transactions struct as a constructor.
Completely overhauled documentation and fully documented all public functions and structs.
New logo:

logo800×320 14.3 KB

Future Plans

Future releases will add high utility mining algorithms as well as sequential mining algorithms.