Yes it does look like it. I find that paper extremely hard to read because they require medical/chemical domain knowledge for their algorithm description.
This one is much clearer. It looks like the SIRUS algorithm is very similar indeed. If you want to benchmark that algorithm iRF (which was removed from CRAN?) feel free to open a PR and re-use the R benchmarking logic in https://github.com/rikhuijzer/SIRUS.jl/blob/main/test/rcall.jl. I won’t guarantee that I’ll merge the PR, but at least we can re-use the benchmarking setup and see how well it performs.
Given that SIRUS.jl
and Julia are often used for smallish research datasets, that feature would make sense yes. However, I and many people don’t need that feature so I probably won’t implement it myself. Feel free to open a PR. If the implementation does not add too much complexity then it will likely be merged. Code for an MLJ wrapper for imbalanced datasets is at Oversampling and undersampling · Issue #661 · alan-turing-institute/MLJ.jl · GitHub (I’m not sure you need this, but the link may be useful if you do).