Hello,
I’ve recently started using Flux.jl. I’ve seen that Flux provides different kinds of optimizers, such as Descent, ADAM, etc etc…
I was just wondering: does the Descent optimizer perform full batch gradient descent, i.e. it sums up the gradient over all examples, or does it perform mini-batch/stochastic gradient descent, i.e. it sums up the gradient over a lower number of examples based on the batch size? If this is the case, how can I select the batch size?
Thanks in advance!