Boruta algorithm

Jaidy · February 20, 2022, 9:01pm

Hi everyone,
I am looking to use Boruta algorithm to select features for my project. First I looked up for a Julia implementation of Baruta algorithm, but could not find any. Then I found Python implementation and tried using it via PyCall, but I can not figure it out. The following is the code from a website,

from sklearn.ensemble import RandomForestRegressor
from boruta import BorutaPy

model = RandomForestRegressor(n_estimators=100, max_depth=5, random_state=42)

feat_selector = BorutaPy(
verbose=2,
estimator=model,
n_estimators=‘auto’,
max_iter=10, # numero di iterazioni da fare
random_state=42,
)

feat_selector.fit(np.array(X), np.array(y))
…

I could import random forests, albeit from DecisionTree. Can someone please help me in converting the above code to Julia using PyCall, (or better if there is a Julia implementation of Boruta algorithm itself, that will be best).

juliohm · February 20, 2022, 9:30pm

Just sharing an alternative algorithm that is super popular in ML nowadays in case you haven’t heard:

Shapley values work with any ML model. The above implementation is integrated with MLJ.jl so you don’t need to call Python.

I had the chance to try the package in an internal project and it worked really well.

Jaidy · February 21, 2022, 12:34am

Thank you so much for pointing me to this algorithm l. I hadn’t come across this before. It works well, though it takes a while to run while other Boruta (in Python, not Julia calling it’s Python implementation which I haven’t figured out just yet) is much faster.

Did it give you similar run time or I made a mistake in incorporating it in my work?

juliohm · February 21, 2022, 11:35am

Hi @Jaidy I don’t recall it taking a long time. Try to check their documentation to see if there is any option that you can use to compute just the first most important features, etc.

Also, ask yourself if speed is a concern in your use case. It is not always the case. Feature importance is typically done once before the actual analysis.

Jaidy · February 21, 2022, 11:46am

Very good point, thank you.

Topic		Replies	Views
ML feature importance in julia General Usage question	15	4876	July 21, 2020
Comparison between Julia and Python Random Forest Regression Machine Learning	6	3762	December 20, 2022
Feature selection+classification pipeline Machine Learning	3	663	June 7, 2022
[ANN] SIRUS.jl v1.2: Interpretable Machine Learning via Rule Extraction Package Announcements	6	1367	December 3, 2023
[ANN] BetaML.jl.. yet an other (simple) Machine Learning Package Package Announcements package , announcement , machine-learning	16	3340	May 15, 2024

Boruta algorithm

Related topics