Many thanks for the explanation. Though I’m not sure I follow fully. The
algorithm should work so that the points to take the gradient over (what
you refer to as w0) should be the old estimates of w. In other words, w0
needs to be indexed in some way so that it relates to the correct w. I
cannot see how w, w0 and out relates to each other. A somewhat sloppy
pseudo code would be:
Merge x and y so that the variables are in the rows (of dim p)
Define loss = (y - w * x) ^ 2 / n
Initialize w of length p and set i = 1
Set stepsize lr = 0.1
For i = 2 to niter
For j = 1 to p
w[i, j] = w[i-1, j] - lossgradient(w[i-1, j]) * lr
End
End