SGD in Flux.jl

Not getting what you saying.

The S in SGD doesn’t come from choosing a random direction. It still moves in the direction that minimizes the loss. The SGD comes from passing in partial data batches e.g. if your data is 1_000_000 records you only pass it 32 records at a time. These 32 are randomly re-assigned each epoch, so your GD is stochastic by the randomness of where you are in the loss function and the randomness in the batch.

Hope that makes sense.

6 Likes