[ANN] Imbalance.jl - A well-documented, multi-interface and comprehensive Julia toolbox for addressing class imbalance

EssamWisam · October 11, 2023, 2:19am

Pleased to announce the Imbalance.jl package, a Julia-based toolkit featuring a wide range of established oversampling and undersampling techniques designed to tackle the class imbalance issue to improve classification model performance.

Features

Supports multi-class variants of the algorithms and has methods that support both nominal and continuous features
Supports table input/output formats as well as matrices
Comprehensively documented with illustrative (visual) and practical examples for using the methods as shown method documentation and examples section.
Provides MLJ and TableTransforms interfaces aside from the default pure functional interface for each method
Ability to wrap an arbitrary number of resampling models with a classification model from MLJ using MLJBalancing to function as one unified model

You can read more about the package and its features in this Julia Forem article.

Acknowledgements

Sincere thanks go to Anthony Blaom (@ablaom) for being my mentor in Google Summer of Code where this project was proposed and special thanks also go to Rik Huijzer(@rikh) for his friendliness and the binary SMOTE implementation in Resample.jl.

P.S. I am restricted by the number of links that the post can include for being a new user on Discourse. Julia Forem article has more links.

tecosaur · October 11, 2023, 3:14am

Great stuff! It’s great to see the Statistical Learning ecosystem moving forwards one step at a time with packages like this and StatisticalMeasures.jl .

One query, would this be able to compose with a stratified sampling scheme? A project I’m working on has two (extremely imbalanced) categories of data, and within each category there are blocks of (highly correlated) entries, and so I must employ a two-level sampling scheme where first I pick blocks, undersampling from the larger category, and then within each block randomly select an entry.

EssamWisam · October 11, 2023, 12:58pm

Glad to hear that you have liked it. Thank you.

One query, would this be able to compose with a stratified sampling scheme?

It composes with anything that operates on (takes and returns) X, y data (where y can also be a column in X).

A project I’m working on …

As far as I understand, the idea you are referring to here is cluster sampling. The naive RandomUndersampler provided in Imbalance.jl won’t help you do that, it will just delete examples randomly irrelevant of block in each class which would lead to the desired effect only if you have enough data. However, you can also try ClusterUndersampler which is pretty much cluster sampling done to each class, where the groups in each class are decided by k-means. Otherwise, for a hacky solution, if you perform naive random undersampling but have X, y where X is all the data in the majority category and y labels to what block each data point belongs to then you should be able to set the ratios hyperparameter to achieve your desired effect.

mrufsvold · October 12, 2023, 11:20am

You had me at “well-documented” But seriously, awesome work!

Topic		Replies	Views
Upsampling and downsampling in Julia for unbalanced classes General Usage question , package	4	2063	December 29, 2022
MLJ TunedModel with imbalanced classes General Usage question , package , mlj	1	677	June 28, 2021
Does Julia have a Stratified Sampling function? General Usage	1	512	May 19, 2022
StratifiedKfold General Usage question	7	1723	July 6, 2020
[ANN] BetaML.jl.. yet an other (simple) Machine Learning Package Package Announcements package , announcement , machine-learning	16	3340	May 15, 2024

[ANN] Imbalance.jl - A well-documented, multi-interface and comprehensive Julia toolbox for addressing class imbalance

Features

Acknowledgements

Related topics