Best libraries for experimenting with small scale models?

I’m looking for tools to do ML experiments in Julia. Specifically, I’d like trying to design some small-ish models (hundreds to thousands of parameters) and experiment with different approaches that aren’t just differently ordered chains of the usual NN layers. Basically what I’d like is a framework that provides me with:

  • automatic differentiation
  • standard optimization algorithms
  • loss functions

and very little more, leaving otherwise almost full freedom to write your model as any function. Also, running on CPU is perfectly fine. Is there something that fits these requirements? Lux.jl seemed like a possible choice but I haven’t dug deep into it.

Lux.jl is certainly a good option. Depending on what you need exactly, you may also be fine with just using DifferentiationInterface.jl (for autodiff), any optimization package (Optimisers.jl, Optim.jl, Optimization.jl), and custom loss functions or the losses in Flux.jl, Lux.jl or LossFunctions.jl

1 Like

My completely unbiased recommendation is that you try Flux… or just Optimisers.jl + Zygote.jl (or Enzyme.jl).

Flux regards any callable struct containing parameters as a model. The readme gives this example, in which the anonymous function x -> ... captures 3 arrays:

model = let
  w, b, v = (randn(Float32, 23) for _ in 1:3)  # parameters
  x -> sum(v .* tanh.(w*x .+ b))               # callable
end
typeof(model)  # var"#52#54"{Vector{Float32}, Vector{Float32}, Vector{Float32}}

If you take a derivative with respect to model using Zygote.jl (or Enzyme.jl), you get another struct which contains matching fields:

data = [(x, 2x-x^3) for x in -2:0.1f0:2];
grads = Zygote.gradient((m,x,y) -> (m(x) - y)^2, model, data[1]...)
model.w .-= 0.1 .* grads[1].v  # grads[1] has same fields as model

Instead of gradient descent, we can use say Adam, applied to all the parameters, like this – Optimisers.jl again understands structs with matching fields:

optstate = Optimisers.setup(Adam(), model)
Optimisers.update!(optstate, model, grads[1]); 

That covers 2 of your 3 bullet points, and we haven’t loaded Flux.jl yet. Many loss functions are so simple you can just write them out, like (m(x) - y)^2. But Flux.jl is one place to get a library of standard ones. (Plus standard model-building layers, which it sounds like you may not want.)

Instead of the “anonymous struct” created by x -> ..., you can make your own struct MyModel; w::Vector{Float32}; ..., and make it callable (m::MyModel)(x) = sum(m.v .* tanh.(m.w*x .+ m.b)). Fluxperimental.jl has some tools for making this a little easer (and easier to revise) but using basic Julia is fine too.

3 Likes

Yeah, I was considering something like this. Simple loss functions like square error or cross-entropy are no problem, but it’s nice to have standard implementations regardless whenever possible.

1 Like

It would not be crazy to split these functions out into their own little package, if someone wants it enough. Several such things have been removed, e.g. Tracker.jl, OneHotArrays.jl, arguably Optimisers.jl

2 Likes

I have no idea how (or whether) it fits the rest of your requirements, but “small-ish models” and “running on CPU” brought to mind SimpleChains.jl: Doing small network scientific machine learning in Julia 5x faster than PyTorch

I think that would in general be a good idea. I don’t really see a reason why all the deep learning frameworks shouldn’t share the implementation of the loss functions. But I might be missing something.