Training a simple linear model in Flux

I’m trying to get started with Flux, but finding the documentation a bit lacking in regards to explaining the basic functionality.

For example, how would I train the following linear model with a single scalar parameter b using ADAM?

julia> X = randn(100);

julia> Y = 0.5 .* X .+ randn(100);

julia> loss(xs, ys, b) = sum((ys .- b .* xs).^2)
loss (generic function with 1 method)

julia> Flux.Tracker.update!(Flux.ADAM(), Params(0.0), b -> gradient(b0 -> loss(X,Y,b0), b))
ERROR: MethodError: no method matching getindex(::getfield(Main, Symbol("##183#185")), ::Float64)
Closest candidates are:
  getindex(::Any, ::AbstractTrees.ImplicitRootState) at /home/me/.julia/packages/AbstractTrees/z1wBY/src/AbstractTrees.jl:344
 [1] update!(::ADAM, ::Params, ::Function) at /home/me/.julia/packages/Flux/qXNjB/src/optimise/train.jl:11
 [2] top-level scope at REPL[245]:1

The following should work.

First, introduce some packages…

# Packages
using Flux
using Plots; pyplot()
using LaTeXStrings
using Statistics

Next, your data that you wish to fit a model to:

X = rand(100)
Y = 0.5X + rand(100)
plot!(xlabel=L"x",ylabel=L"y",title="Data for linear model")

The data may look as follows:

Next, put your data in data arrays of a shape that Flux can use:

# Preparing data in correct data structure
Xd = reduce(hcat,X)
Yd = reduce(hcat,Y)
data = [(Xd,Yd)]

Next, set up the Flux model:

# Set up Flux problem
# Model 
mod = Dense(1,1)
# Initial mapping
Yd_0 =
# Setting up loss/cost function
loss(x, y) = mean((mod(x).-y).^2)
# Selecting parameter optimization method
opt = ADAM(0.01, (0.99, 0.999))
# Extracting parameters from model
par = params(mod);


  • Since you have a monovariable, linear (affine) mapping, a single layer (no hidden layers) with a linear activation function (default) is sufficient.
  • Dense is the Flux name for the standard Feedforward Neural Net (FNN) block.
  • Yd_0 is the mapping from x to y with the initial (randomly generated) set of model parameters in model mod.
  • In the last line, I name the parameters by par so that I can refer to par in the next code block where I train the “network” (the linear model).

Next, you need to train the model against the data – a major iteration is denoted an “epoch”:

# Training over nE epochs
nE = 1_000
for i in 1:nE
# Final mapping
Yd_nE =;

Here, Yd_nE is the mapping from x to y with the model parameters as they are after nE epochs:

plot!(xlabel=L"x",ylabel=L"y",title="Data for linear model")

… and then the result:

Of course, in this case, it would be much simpler and faster to solve the model using linear algebra.


This should be on a blog or something. Nice and simple!

1 Like

Yes, I think it is a simple introduction. One should just add a nonlinear case + a multi input case, plus a few things more, and that would be a simple way to make people start using Flux.

I’m not really a blogger; I would have to figure out how to set up a blog myself, were I to do it.