Boruta algorithm

Hi everyone,
I am looking to use Boruta algorithm to select features for my project. First I looked up for a Julia implementation of Baruta algorithm, but could not find any. Then I found Python implementation and tried using it via PyCall, but I can not figure it out. The following is the code from a website,

from sklearn.ensemble import RandomForestRegressor
from boruta import BorutaPy

model = RandomForestRegressor(n_estimators=100, max_depth=5, random_state=42)

feat_selector = BorutaPy(
verbose=2,
estimator=model,
n_estimators=‘auto’,
max_iter=10, # numero di iterazioni da fare
random_state=42,
)

feat_selector.fit(np.array(X), np.array(y))

I could import random forests, albeit from DecisionTree. Can someone please help me in converting the above code to Julia using PyCall, (or better if there is a Julia implementation of Boruta algorithm itself, that will be best).

Just sharing an alternative algorithm that is super popular in ML nowadays in case you haven’t heard:

Shapley values work with any ML model. The above implementation is integrated with MLJ.jl so you don’t need to call Python.

I had the chance to try the package in an internal project and it worked really well.

2 Likes

Thank you so much for pointing me to this algorithm l. I hadn’t come across this before. It works well, though it takes a while to run while other Boruta (in Python, not Julia calling it’s Python implementation which I haven’t figured out just yet) is much faster.

Did it give you similar run time or I made a mistake in incorporating it in my work?

1 Like

Hi @Jaidy I don’t recall it taking a long time. Try to check their documentation to see if there is any option that you can use to compute just the first most important features, etc.

Also, ask yourself if speed is a concern in your use case. It is not always the case. Feature importance is typically done once before the actual analysis.

Very good point, thank you.