Types and gradients, including Forward.gradient

Many thanks for the explanation. Though I’m not sure I follow fully. The
algorithm should work so that the points to take the gradient over (what
you refer to as w0) should be the old estimates of w. In other words, w0
needs to be indexed in some way so that it relates to the correct w. I
cannot see how w, w0 and out relates to each other. A somewhat sloppy
pseudo code would be:

Merge x and y so that the variables are in the rows (of dim p)
Define loss = (y - w * x) ^ 2 / n
Initialize w of length p and set i = 1
Set stepsize lr = 0.1
For i = 2 to niter
  For j = 1 to p
     w[i, j] = w[i-1, j] - lossgradient(w[i-1, j]) * lr
  End
End