[ANN] Mice.jl - multiple imputation by chained equations in Julia

tom-metherell · November 9, 2023, 2:05pm

Hi everyone! This is just to introduce Mice.jl. This is a package for missing data handling via multiple imputation by chained equations, heavily based on the R package mice.

I wouldn’t necessarily trust it 100% just yet (it needs a lot more testing!) but I hope this will end up being helpful!

Documentation and examples at https://tom-metherell.github.io/Mice.jl.

nilshg · November 9, 2023, 4:04pm

Finally someone gets around to doing this, thank you! A few questions:

Why the focus on DataFrames? Seems like this should work with Tables.jl?
Why do you need Plots as a dependency? Plotting recipes can be defined with Recipes.jl without taking the super heavy Plots dependency, if you really have some functionality that requires loading plots you should look into package extensions
What’s the performance on the second run?
Relatedly, have you looked into PrecompileTools and other strategies to reduce latency?

Really excited to see a native Julia MICE implementation!

tom-metherell · November 9, 2023, 4:17pm

Hi @nilshg! In response:

To be honest, I mostly wrote this for myself, so wasn’t focusing on making it compatible with packages that I don’t use but will raise this as an issue to work on at some point.
Ah! Thanks for letting me know, will fix that
In the (super non-rigorous) benchmarking I did, performance plateaued at about 4x the speed of the R package on the 2nd run. But I’ve only been using Julia for about 9 months, so there is probably scope for significant further improvement
I did look into it briefly and found that it didn’t make any difference - but again that might be because I’m inexpert rather than because it can’t make a difference.

Thanks for your input and I hope this package will improve significantly with time!

juliohm · November 9, 2023, 5:41pm

I second that. Opened an issue in the repository.

If you can switch to Tables.jl, we can easily implement the TableTransforms.jl interface and use the mice with tons of other available transforms for tabular data.

sylvaticus · November 10, 2023, 1:46pm

Hello, a set of missing values imputers that can produce multiple imputations is also provided by the Imputation sub-module of BetaML.

Currently, the provided imputers are:

FeatureBasedImputer: Impute data using the feature (column) mean, optionally normalised by l-norms of the records (rows) (fastest)
GMMImputer: Impute data using a Generative (Gaussian) Mixture Model (good trade off)
RFImputer: Impute missing data using Random Forests, with optional replicable multiple imputations (most accurate).
UniversalImputer: Impute missing data using a vector (one per column) of arbitrary learning models (classifiers/regressors) that implement m = Model([options]), fit!(m,X,Y) and predict(m,X) (not necessarily from BetaML).

Although multiple imputations are provided (in the models with stochastic imputers), then there is no mechanism to pool the results from further analysis with the imputed values.

tom-metherell · November 13, 2023, 5:45pm

v0.1.0 (hopefully) fixes a number of the issues that have been raised so far, most importantly dropping the reliance on DataFrames.jl and using Tables.jl instead. We’ve taken a small performance hit (though approximately half of that was because I accidentally missed a line out of the benchmarking code before, oops!). Will attempt to improve performance in the near future

tom-metherell · December 12, 2023, 11:15am

Little update - I’ve made some performance improvements, so Mice.jl is now roughly 2x as fast as before (5-6x as fast than the R equivalent in Linux, ~7x as fast in Windows). Next step is probably to add more methods - and of course if anyone has any requests please raise an issue!

Topic		Replies	Views
Mice for julia Statistics question , package , data	10	1637	November 9, 2023
How to do multiple imputation on Julia? Statistics	15	3412	May 2, 2021
Missing imputation: comparision of BetaML, Python SKL, R Mice Data benchmark , missing-values	1	686	October 5, 2022
[ANN] Announcing ItPropFit.jl Package Announcements announcement	2	431	September 12, 2022
ANN: JuliaDB.jl Community	40	9705	November 13, 2018

[ANN] Mice.jl - multiple imputation by chained equations in Julia

Related topics