My completely unbiased recommendation is that you try Flux… or just Optimisers.jl + Zygote.jl (or Enzyme.jl).
Flux regards any callable struct containing parameters as a model. The readme gives this example, in which the anonymous function x -> ...
captures 3 arrays:
model = let
w, b, v = (randn(Float32, 23) for _ in 1:3) # parameters
x -> sum(v .* tanh.(w*x .+ b)) # callable
end
typeof(model) # var"#52#54"{Vector{Float32}, Vector{Float32}, Vector{Float32}}
If you take a derivative with respect to model
using Zygote.jl (or Enzyme.jl), you get another struct which contains matching fields:
data = [(x, 2x-x^3) for x in -2:0.1f0:2];
grads = Zygote.gradient((m,x,y) -> (m(x) - y)^2, model, data[1]...)
model.w .-= 0.1 .* grads[1].v # grads[1] has same fields as model
Instead of gradient descent, we can use say Adam, applied to all the parameters, like this – Optimisers.jl again understands structs with matching fields:
optstate = Optimisers.setup(Adam(), model)
Optimisers.update!(optstate, model, grads[1]);
That covers 2 of your 3 bullet points, and we haven’t loaded Flux.jl yet. Many loss functions are so simple you can just write them out, like (m(x) - y)^2
. But Flux.jl is one place to get a library of standard ones. (Plus standard model-building layers, which it sounds like you may not want.)
Instead of the “anonymous struct” created by x -> ...
, you can make your own struct MyModel; w::Vector{Float32}; ...
, and make it callable (m::MyModel)(x) = sum(m.v .* tanh.(m.w*x .+ m.b))
. Fluxperimental.jl has some tools for making this a little easer (and easier to revise) but using basic Julia is fine too.