I need to apply some machine learning algorithms to a very big dataset (about 30 G). I personally prefer pure Julia implementations without any dependencies on Python. Here are my lists of algorithms and corresponding Python packages.
- radius basis function support-vector regression (scikit-learn)
- random forest (scikit-learn)
- adaptive boosting (scikit-learn)
- extreme gradient boosting (xgboost)
- artificial neural networks (TensorFlow)
- long short-term memory (TensorFlow)
To my best knowledge, Flux and Knet are the two major pure Julia machine learning packages. To implement these algorithms, which package should I use?
PS: My computer uses I9-9900K cpu, GeForce GTX 1660 gpu, and has 128 GB ram.
Flux and Knet are both neural network libraries. They provide functionality for building and training neural networks. They don’t have ready made implementations of common algorithms like random forest classification and support vector machines. Both packages are analogous to TensorFlow or PyTorch, not to scikit-learn. To my knowledge there is no well known julia package that’s directly comparable to sci-kit learn—i.e. a batteries included general purpose machine learning library. Others can probably say more but as far as I know the closest thing we have in Julia is the MLJ framework, which provides a common interface to many machine learning models to facilitate tuning and model comparison but relies on other packages to actually implement the models.
Both. and let us know how you feel about each.
Since you are trying XGBoost, you can checkout EvoTrees.jl for a pure Julia implementation of GBRT
My goal is to get this done quickly to get myself better with Julia. I will only try one package. If it does not work, I will roll back to Python.
In case you’re not aware already, these two options are not mutually exclusive! PyCall.jl can let you mix Julia and Python code freely. See for example GitHub - ageron/julia_notebooks: Julia Jupyter/Colab Notebooks, which is very deep learning focused.
There is a scikit-learn Julia library: Introduction · ScikitLearn.jl