Fit learner using CombineML

What is the issue with the following code? I am using Julia 1.0 on Win10.

using CombineML
using CombineML.Util
using CombineML.Transformers

# # Create a learner
# # Learner with default settings
# learner = PrunedTree()

# # Learner with some of the default settings overriden
# learner = PrunedTree(Dict(
#   :impl_options => Dict(
#     :purity_threshold => 0.5
#   )
# ))

# All learners are called in the same way.
learner = StackEnsemble(Dict(
  :learners => [
    PrunedTree(), 
    RandomForest(),
    DecisionStumpAdaboost()
  ], 
  :stacker => RandomForest()
));

# Create pipeline
pipeline1 = Pipeline(Dict(
  :transformers => [
    OneHotEncoder(), # Encodes nominal features into numeric
    Imputer(), # Imputes NA values
    StandardScaler(), # Standardizes features 
    PCA(),
    learner # Predicts labels on features
  ]
));

# Train
fit!(pipeline1, X, y)

Output: 

UndefVarError: fit! not defined
Stacktrace:
 [1] top-level scope at In[29]:35

FWIW, I get a different error (possibly using different data) than you. It seems to suggest that the types at the end of the pipeline before learner don’t match what fit! expects:

using CombineML.Util
using CombineML.Transformers
import RDatasets
iris = RDatasets.dataset("datasets", "iris")
X = convert(Array, iris[[:SepalLength, :SepalWidth, :PetalLength, :PetalWidth]])
y = convert(Array, iris[:Species]);
learner = StackEnsemble(Dict(
  :learners => [
    PrunedTree(), 
    RandomForest(),
    DecisionStumpAdaboost()
  ], 
  :stacker => RandomForest()
));

# Create pipeline
pipeline1 = Pipeline(Dict(
  :transformers => [
    OneHotEncoder(), # Encodes nominal features into numeric
    Imputer(), # Imputes NA values
    StandardScaler(), # Standardizes features 
    PCA(),
    learner # Predicts labels on features
  ]
));
fit!(pipeline1, X, y)

the error being

ERROR: MethodError: no method matching fit!(::StackEnsemble, ::LinearAlgebra.Adjoint{Float64,Array{Float64,2}}, ::Array{String,1})
Closest candidates are:
  fit!(::StackEnsemble, ::Array{T,2} where T, ::Array{T,1} where T) at /Users/tlienart/.julia/packages/CombineML/p9oKU/src/combineml/ensemble.jl:84
  fit!(::Transformer, ::Array{T,2} where T, ::Array{T,1} where T) at /Users/tlienart/.julia/packages/CombineML/p9oKU/src/types.jl:27
  fit!(::Baseline, ::Array{T,2} where T, ::Array{T,1} where T) at /Users/tlienart/.julia/packages/CombineML/p9oKU/src/combineml/baseline.jl:34

showing that the issue is that whatever comes out of your pipeline, once it gets to the learner, the dimensions and or types are not ok.

In fact just applying your learner directly on the data works

julia> fit!(learner, X, y)
Dict{Symbol,Any} with 4 entries:
  :learners               => Learner[PrunedTree(Decision Tree…
  :keep_original_features => false
  :stacker                => RandomForest(Ensemble of Decision Trees…
  :label_map              => LabelMap (with 3 labels):…

So maybe a couple of pointers to try:

  • apply the same using as in the example so not loading CombineML by itself
  • make sure that the pipeline doesn’t alter types and dimensions differently than what you’d expect

@tlienart: Thanks for your quick response.

I have been able to pin-point the issue here. By commenting out the PCA() bit from the pipeline and prefixing fit with Transformers, the code works fine. I will separately investigate why PCA is not working. The fit error probably occured because I have also initialised StatsBase as one of package for dataprocessing and this led to a confusion about which fit method to use by the code.

However, the score() does not work now which I am not much worried. Do I need to prefix it as well?

# Create pipeline
pipeline1 = Pipeline(Dict(
  :transformers => [
    OneHotEncoder(), # Encodes nominal features into numeric
    Imputer(), # Imputes NA values
    StandardScaler(), # Standardizes features 
#     PCA(),
    learner  # Predicts labels on features
  ]
));

# Train
Transformers.fit!(pipeline1, X_train0, y_train0)

# Predict
predictions = transform!(pipeline1, X)
# Assess predictions
result = score(:accuracy, y, predictions)
Output:
UndefVarError: score not defined

Stacktrace:
 [1] top-level scope at In[30]:3

nice

I think CombineML is not being actively maintained anymore unfortunately (I just tried to go through the example ipynb and a number of things needed fixing maybe you could also mention your trouble with PCA?

For the score bit, did you do using CombineML.Util ? I seem to have the score function ok:

# fresh julia
julia> using CombineML.Util
help?> score
search: score isconcretetype searchsorted searchsortedlast searchsortedfirst StackOverflowError

  No documentation found.

  CombineML.Util.score is a Function.

  # 1 method for generic function "score":
  [1] score(metric::Symbol, actual, predicted) in CombineML.Util at /Users/tlienart/.julia/dev/CombineML/src/util.jl:53

otherwise you can maybe consider using MLMetrics

Yes it is the very first line of my code. I used confusion_matrix from DecisionTree to get the accuracy score.