Is there a way to automatically create a visual representation (flowchart) of the pipelines?
iβm working on it. you can use @pipelinex instead of @pipeline for the expression. i know there is a package that can translate this into tree representation by passing this expression. I think you can try TreeView.jl if you like to see the tree structure of the expression.
You can try this. Install AbstractTrees and use print_tree:
using AbstractTrees
using AutoMLPipeline
print_tree(stdout,@pipelinex a |> (b |> d) + (c |> e) |> rf)
:(Pipeline(Pipeline(a, ComboPipeline(Pipeline(b, d), Pipeline(c, e))), rf))
ββ :Pipeline
ββ :(Pipeline(a, ComboPipeline(Pipeline(b, d), Pipeline(c, e))))
β ββ :Pipeline
β ββ :a
β ββ :(ComboPipeline(Pipeline(b, d), Pipeline(c, e)))
β ββ :ComboPipeline
β ββ :(Pipeline(b, d))
β β ββ :Pipeline
β β ββ :b
β β ββ :d
β ββ :(Pipeline(c, e))
β ββ :Pipeline
β ββ :c
β ββ :e
ββ :rf
Just an update of the latest feature in the @pipeline call of AutoMLPipeline. Aside from |>
and +
operators that are for Linear
and Combo Pipeline
, you can now use *
to act as a Selector Pipeline
to pick the best ML learner. Hereβs an example:
julia> pcmc = @pipeline disc |> ((catf |> ohe) + (numf |> std)) |> (jrf * ada * sgd * tree * lsvc)
julia> crossvalidate(pcmc,X,Y,"accuracy_score",10)
(mean = 0.7276977412403225, std = 0.033181493759015454, folds = 10)
The Selector Pipeline
performs internal cross-validation among the learners: jrf, ada, sgd, tree, lsvc. It will then use the best learner prediction as its final output.
The much longer typical workflow to pick the best learner and use its output will be:
julia> learners = DataFrame()
julia> for learner in [jrf,ada,sgd,tree,lsvc]
pcmc = @pipeline disc |> ((catf |> ohe) + (numf |> std)) |> learner
println(learner.name)
mean,sd,_ = crossvalidate(pcmc,X,Y,"accuracy_score",10)
global learners = vcat(learners,DataFrame(name=learner.name,mean=mean,sd=sd))
end;
julia> @show learners;
learners = 5Γ3 DataFrame
β Row β name β mean β sd β
β β String β Float64 β Float64 β
βββββββΌβββββββββββββββββββββββββΌβββββββββββΌββββββββββββ€
β 1 β rf_k2d β 0.684652 β 0.0334061 β
β 2 β AdaBoostClassifier_1rk β 0.698086 β 0.0576059 β
β 3 β SGDClassifier_2xI β 0.715688 β 0.0452629 β
β 4 β prunetree_pSa β 0.578826 β 0.0459255 β
β 5 β LinearSVC_39A β 0.730508 β 0.0494756 β
Based on these results, Linear SVC will be chosen by the user because its performance is the best (73.00%). The Selector Pipeline
also used Linear SVC to achieve almost similar performance (72.7%) in an automatic manner.