Gradient Descent Optimizer in Flux.jl

#1

Hello,

I’ve recently started using Flux.jl. I’ve seen that Flux provides different kinds of optimizers, such as Descent, ADAM, etc etc…

I was just wondering: does the Descent optimizer perform full batch gradient descent, i.e. it sums up the gradient over all examples, or does it perform mini-batch/stochastic gradient descent, i.e. it sums up the gradient over a lower number of examples based on the batch size? If this is the case, how can I select the batch size?

Thanks in advance!

#2

I believe it is stochastic gradient descent, but for ultimate verification, you can look at the source code.