We are happy to announce the release of KernelMachines.jl.
Kernel machines are a special case of a more general framework, parametric machines—see this article for technical details on the framework.
The key idea of parametric machines in general (and kernel machines in particular) is to build large spaces of “neural network-like” architectures and ensure that those spaces have good geometrical properties: completeness and, for a regularized problem on a finite training dataset, compactness. Then, one can look for optimal solutions for a given problem inside those spaces.
Our ambition is to be able to use ideas from deep learning (parameterizing complex functions by stacking together simpler ones) even in scenarios where deep learning was traditionally less successful, such as small training datasets.
In practice, finite-depth kernel machines are a hybrid between deep neural networks and kernel methods. They are defined by giving not only a kernel function, but also a tuple of integer values, which correspond to the dimensions of the hidden spaces.
This package provides a
KernelMachineRegression method, to fit and predict data using kernel machines, as well as a standard
KernelRegression, using kernel Ridge method, for comparison. See this example in the docs to get started.
The implementation relies both on Zygote.jl (for automatic differentiation) and on Optim.jl to minimize the loss function.
KernelMachineRegression is a slow method for two reasons, both of which can be addressed.
It is slower than standard kernel regression, because there is no closed from solution.
It shares the poor scaling with the data (memory cost is quadratic as a function of the size of the training data) of kernel methods.
The first issue can already be addressed in practice by interrupting the optimization procedure before it reaches an optimum. For example, allowing a larger relative tolerance in the achieved loss minimum (see the last section here for an example).
The second issue will require to implement Nyström method, to select a subset of the training data as “anchor points”, as well as potentially a “batched mode” if the data is too big. This is not yet implemented, but it is a planned feature.
The julia implementation of infinite-depth kernel machines (based on an integral Volterra equation rather than on a finite list of layers) is planned, but it does not yet exist.
Feedback is welcome!