AutoMLPipeline.jl makes it easy to create complexed ML pipeline structures

JP_Nieto · March 5, 2020, 3:09pm

Is there a way to automatically create a visual representation (flowchart) of the pipelines?

ppalmes · March 5, 2020, 3:42pm

i’m working on it. you can use @pipelinex instead of @pipeline for the expression. i know there is a package that can translate this into tree representation by passing this expression. I think you can try TreeView.jl if you like to see the tree structure of the expression.

ppalmes · March 5, 2020, 4:23pm

You can try this. Install AbstractTrees and use print_tree:

using AbstractTrees
using AutoMLPipeline

print_tree(stdout,@pipelinex a |> (b |> d) + (c |> e) |> rf)
:(Pipeline(Pipeline(a, ComboPipeline(Pipeline(b, d), Pipeline(c, e))), rf))
├─ :Pipeline
├─ :(Pipeline(a, ComboPipeline(Pipeline(b, d), Pipeline(c, e))))
│ ├─ :Pipeline
│ ├─ :a
│ └─ :(ComboPipeline(Pipeline(b, d), Pipeline(c, e)))
│ ├─ :ComboPipeline
│ ├─ :(Pipeline(b, d))
│ │ ├─ :Pipeline
│ │ ├─ :b
│ │ └─ :d
│ └─ :(Pipeline(c, e))
│ ├─ :Pipeline
│ ├─ :c
│ └─ :e
└─ :rf

ppalmes · March 9, 2020, 10:19pm

Just an update of the latest feature in the @pipeline call of AutoMLPipeline. Aside from |> and + operators that are for Linear and Combo Pipeline, you can now use * to act as a Selector Pipeline to pick the best ML learner. Here’s an example:

julia> pcmc = @pipeline disc |> ((catf |> ohe) + (numf |> std)) |> (jrf * ada * sgd * tree * lsvc)
julia> crossvalidate(pcmc,X,Y,"accuracy_score",10)
(mean = 0.7276977412403225, std = 0.033181493759015454, folds = 10)

The Selector Pipeline performs internal cross-validation among the learners: jrf, ada, sgd, tree, lsvc. It will then use the best learner prediction as its final output.

The much longer typical workflow to pick the best learner and use its output will be:

julia> learners = DataFrame()
julia> for learner in [jrf,ada,sgd,tree,lsvc]
         pcmc = @pipeline disc |> ((catf |> ohe) + (numf |> std)) |> learner
         println(learner.name)
         mean,sd,_ = crossvalidate(pcmc,X,Y,"accuracy_score",10)
         global learners = vcat(learners,DataFrame(name=learner.name,mean=mean,sd=sd))
       end;
julia> @show learners;
learners = 5×3 DataFrame
│ Row │ name                   │ mean     │ sd        │
│     │ String                 │ Float64  │ Float64   │
├─────┼────────────────────────┼──────────┼───────────┤
│ 1   │ rf_k2d                 │ 0.684652 │ 0.0334061 │
│ 2   │ AdaBoostClassifier_1rk │ 0.698086 │ 0.0576059 │
│ 3   │ SGDClassifier_2xI      │ 0.715688 │ 0.0452629 │
│ 4   │ prunetree_pSa          │ 0.578826 │ 0.0459255 │
│ 5   │ LinearSVC_39A          │ 0.730508 │ 0.0494756 │

Based on these results, Linear SVC will be chosen by the user because its performance is the best (73.00%). The Selector Pipeline also used Linear SVC to achieve almost similar performance (72.7%) in an automatic manner.

Topic		Replies	Views
Announcing Lale.jl for AutoML Package Announcements machine-learning	4	1013	June 7, 2021
[ANN] MLJ: Outlier Detection, Text Analysis, Improved Pipelines and Serialization Package Announcements machine-learning , mlj , outlier-detection , pipelines , text-analysis	0	782	April 11, 2022
Feature selection+classification pipeline Machine Learning	3	641	June 7, 2022
MLFlow Integration Machine Learning machine-learning	3	746	October 12, 2022
MLJ - A machine learning toolbox for Julia Package Announcements	0	2202	April 30, 2019

AutoMLPipeline.jl makes it easy to create complexed ML pipeline structures

Related topics