Best libraries for experimenting with small scale models?

stur86 · April 15, 2025, 6:28pm

I’m looking for tools to do ML experiments in Julia. Specifically, I’d like trying to design some small-ish models (hundreds to thousands of parameters) and experiment with different approaches that aren’t just differently ordered chains of the usual NN layers. Basically what I’d like is a framework that provides me with:

automatic differentiation
standard optimization algorithms
loss functions

and very little more, leaving otherwise almost full freedom to write your model as any function. Also, running on CPU is perfectly fine. Is there something that fits these requirements? Lux.jl seemed like a possible choice but I haven’t dug deep into it.

jbrea · April 16, 2025, 8:50am

Lux.jl is certainly a good option. Depending on what you need exactly, you may also be fine with just using DifferentiationInterface.jl (for autodiff), any optimization package (Optimisers.jl, Optim.jl, Optimization.jl), and custom loss functions or the losses in Flux.jl, Lux.jl or LossFunctions.jl

mcabbott · April 21, 2025, 8:13pm

My completely unbiased recommendation is that you try Flux… or just Optimisers.jl + Zygote.jl (or Enzyme.jl).

Flux regards any callable struct containing parameters as a model. The readme gives this example, in which the anonymous function x -> ... captures 3 arrays:

model = let
  w, b, v = (randn(Float32, 23) for _ in 1:3)  # parameters
  x -> sum(v .* tanh.(w*x .+ b))               # callable
end
typeof(model)  # var"#52#54"{Vector{Float32}, Vector{Float32}, Vector{Float32}}

If you take a derivative with respect to model using Zygote.jl (or Enzyme.jl), you get another struct which contains matching fields:

data = [(x, 2x-x^3) for x in -2:0.1f0:2];
grads = Zygote.gradient((m,x,y) -> (m(x) - y)^2, model, data[1]...)
model.w .-= 0.1 .* grads[1].v  # grads[1] has same fields as model

Instead of gradient descent, we can use say Adam, applied to all the parameters, like this – Optimisers.jl again understands structs with matching fields:

optstate = Optimisers.setup(Adam(), model)
Optimisers.update!(optstate, model, grads[1]);

That covers 2 of your 3 bullet points, and we haven’t loaded Flux.jl yet. Many loss functions are so simple you can just write them out, like (m(x) - y)^2. But Flux.jl is one place to get a library of standard ones. (Plus standard model-building layers, which it sounds like you may not want.)

Instead of the “anonymous struct” created by x -> ..., you can make your own struct MyModel; w::Vector{Float32}; ..., and make it callable (m::MyModel)(x) = sum(m.v .* tanh.(m.w*x .+ m.b)). Fluxperimental.jl has some tools for making this a little easer (and easier to revise) but using basic Julia is fine too.

stur86 · April 21, 2025, 8:27pm

Yeah, I was considering something like this. Simple loss functions like square error or cross-entropy are no problem, but it’s nice to have standard implementations regardless whenever possible.

mcabbott · April 21, 2025, 9:00pm

It would not be crazy to split these functions out into their own little package, if someone wants it enough. Several such things have been removed, e.g. Tracker.jl, OneHotArrays.jl, arguably Optimisers.jl

digital_carver · April 21, 2025, 10:35pm

I have no idea how (or whether) it fits the rest of your requirements, but “small-ish models” and “running on CPU” brought to mind SimpleChains.jl: Doing small network scientific machine learning in Julia 5x faster than PyTorch

DoktorMike · April 23, 2025, 6:42pm

I think that would in general be a good idea. I don’t really see a reason why all the deep learning frameworks shouldn’t share the implementation of the loss functions. But I might be missing something.

Topic		Replies	Views
Any Julia deep learning frameworks use automatic differentiation? Machine Learning	10	3664	January 24, 2017
State of machine learning in Julia Machine Learning	60	66143	August 26, 2022
Flux: Machine Learning with Julia Machine Learning package , announcement	8	7903	March 3, 2017
GSoC proposal: Addition of optimisers and model loading capabilities to Flux.jl Community	0	592	April 2, 2017
Machine Learning using Julia - Aim/Idealogy of Flux.jl to for simplicity over compexity for programmers Machine Learning question , flux , machine-learning	11	1953	February 8, 2022

Best libraries for experimenting with small scale models?

Related topics