Solved.
All I had to do was incorporate a more adaptive learning rate adjuster and train for much longer on a very small dataset to overfit.
Now the loss goes all the way down to machine zero.
Something along these lines.
lss = loss_function(model, ps, st, x, y)[1]
if min_loss > lss
lr_ = lr_*0.9
Optimisers.adjust!(st_opt, lr_)
else
lr_ = lr_*1.01
if lr_ > 1e-3
lr_ = 1e-3
Optimisers.adjust!(st_opt, lr_)
else
Optimisers.adjust!(st_opt, lr_)
end
end
min_loss = lss