[ANN] LearningHorse.jl: Machine Learning Library

I’m glad to announce that I released the LearningHorse.jl
LearningHorse is the MachineLearning Library for JuliaLang. features

  • Less dependencies.
  • You can build from simple models like regression to complex models like neural networks. So it’s recommended for beginners of machine learning.

Abstract

You can do this with this package:

  • Data Preprocessing
  • Regression
  • Classifications
  • NeuralNetwork
  • DecisionTree

Here are some example for Regression and DecisionTree.
This is a visualization of the DecisionTree using the LearningHorse function:
tree

This is a model of polynomial regression:
polynomial

Please see the documentation for details!

Resources

19 Likes

Great work!

I have checked the documentation and it seems very functional and nice to use. I will check the package with my last problem at hand, and I will send you my feedback.

4 Likes

Thank you for checking out the documentation!
I’ll add tutorials soon to make it easier to understand.

2 Likes

hi,
I started looking into LearningHorse.jl and it looks great.
I’m working with DataFrames.jl and I noticed that the function DataSplitter() support Dataframe, but the OneHotEncoder() only works with integers.
I wrote a small multi-dispatch to the OHE() function and I post it here.
If you think it can contribute to your package fill free to take it.
p.s. I’m not sure it’ll cover all the cases but for my case it works, and it can be a good start.

function (OHE::OneHotEncoder)(data::AbstractVector{T}; prifex="") where {T<:String}
    if !isempty(prifex)
        prifex *= "_"
    else
        prifex = "OHE_"
    end
    unqs = unique(data)
    out = DataFrame(Dict([string(prifex,unq)=>zeros(length(data)) for unq in unqs]...))
    for (ind, k) in enumerate(unqs)
        out[findall(isequal(k), data),ind] .= 1
    end
    return out
end


function (OHE::OneHotEncoder)(df::DataFrame, cols::Vector{T} ) where T
    out = deepcopy(df)
    for col in cols
        if typeof(col) == Int
            prifex = names(df)[col]
        elseif typeof(col) == Symbol
            prifex = string(col)
        else
            prifex = col
        end
        data = out[!,col]
        out = select(out, Not([col]))
        out = hcat(out, OHE(data, prifex=prifex))
    end
    return out
end

using my dataframe with the first row data as:

DataFrameRow
 Row │ total_bill  tip      sex     smoker  day     time    size  
     │ Float64     Float64  String  String  String  String  Int64 
─────┼────────────────────────────────────────────────────────────
   1 │      16.99     1.01  Female  No      Sun     Dinner      2

after running the code

xdatatofit = select(df, Not([:tip, :smoker]))
ohe = OneHotEncoder()
xdatatofit = ohe(xdatatofit,[:sex, :day, :time])

you get the new DataFrame with new columns as One Hot Encoder.

DataFrameRow
 Row │ total_bill  size   sex_Female  sex_Male  day_Fri  day_Sat  day_Sun  day_Thur  time_Dinner  time_Lunch 
     │ Float64     Int64  Float64     Float64   Float64  Float64  Float64  Float64   Float64      Float64    
─────┼───────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │      16.99      2         1.0       0.0      1.0      0.0      0.0       0.0          1.0         0.0
2 Likes

The docs are nice, and the “getting started” section does indeed make it look very straightforward to get started!

Can you elaborate on how LearningHorse is different (in philosophy, goals, strengths, etc.) than other machine learning packages/ecosystems such as Flux and MLJ?

2 Likes

Thank you for great suggestion! I’ll take this code.
I found some bugs in LearningHorse.Preprocessing, so I’ll also fix it and release.
LearningHorse hasn’t yet guaranteed the operation using DataFrame, I would like to support to DataFrame soon.

I think the feature of LerningHorse is:

  • You can easily use various models in Julia
    I aim to be a powerful library that can build various models.
  • Less dependencies
    When I use a library, I often get errors in dependencies(Only me?). So I don’t think we should make too many dependencies in the library.
  • For those who learn machine learning with Julia
    Python has not only a very advanced library dedicated to neural networks, but also libraries that can build various models that are easy to use for beginners. I want LearningHorse to become such a library in Julia.

In other words, I want make LearningHorse a simple library that is powerful enough to build various models and easy to use even for beginners.

2 Likes

how does the implementation compares to, say, GitHub - Evovest/EvoTrees.jl: Boosted trees in Julia

I didn’t know about this library. And I’m not sure about the specifications of this package, it looks like it’s going to be a decision tree analysis from the name or code, but if you look at the document, it seems that regression is also supported.

However, in this library, it seems to be different in that it can be fine settings such as the loss function used to train the model of the decision tree.

I’m developing LearningHorse by myself, so there are many things that each models is still incomplete., but I want to fix it in the feature

Nice work :slight_smile: some points that might be interesting for people reading this thread:

  • there’s a few similar efforts for “simple/no-deps” ML in Julia e.g.:
  • such efforts are great (especially for the author(s)) as learning tools for elementary ML; to have pure Julia code implementing simple models that can be read fairly easily
  • often the authors tend to abandon these efforts after a while because they move on to learning other stuff and because it quickly becomes (very) hard to maintain a reasonable number of models, if you look at sklearn they’ve managed to do this because there’s institutional backing behind it and a lot of contribs as a result (+ now a ton of users)

That shouldn’t mean the effort is not worthwhile and interesting! But it helps explain the difference with MLJ: MLJ had the ambition of providing a backbone for ML, not quite the models themselves but the chaining of ML steps within a full workflow from data ingestion to prediction including hyper parameter tuning etc. The models are provided by dedicated libraries in Julia (such as EvoTrees or DecisionTrees) or in other languages (ScikitLearn.jl, LightGBM etc). These dedicated packages are easier to maintain and to bring full performance to rather than editing one sprawling code base.

This is both a blessing and a curse though; while the specific model packages are lacking in Julia; people might think that MLJ is just some wrapper around ScikitLearn since you would use ScikitLearn models if there’s no dedicated or properly working package in Julia; however this is the vision in which MLJ was built and in the long term we can be hopeful that there will be more and better pure-Julia model packages (or other languages, it doesn’t really matter and in some cases it makes more sense to just interface with an existing high-perf lib such as XGB or LGBM) which can be added to the ecosystem.

this is just for context and answers the question above about the distinction with MLJ; the aim is just different; MLJ by itself does not really aim to provide the basic ML models; rather it helps working with packages that implement such models in a unified workflow.

8 Likes

Thank you, @tlienart for the information and the comment (I did know BetaML but not the other one).

Yes, the philosophy of MLJ is very different to the other ones, giving an common interface to other packages that implement different models. I have used in my ML testing sucessfully but always approach are welcome, specially when their code is so clear and it is well-documented (I teach ML in academia using R and Python and I would to be able to recommend Julia also for that area, in my opinion DataFrames is great, but the Machine Learning packages need a bit more of time).

Both approach has its advantages and disadvantages. For instead, compared the BetaML and MLJ/DecisionTree.jl the last one is a lot faster (EvoTrees seems nice, also). Also, a more direct package sometimes could be easy to learn.

It is true that when it is an individual work it is difficult to maintain (any package) so it could be nice to have more people working in them (feedbacks, features, …). I think the first step is people working/using packages know about the other ones.