Meaning of Δ in different `apply!` methods is different in different Flux optimizers?

Hi,

I want to understand the meaning of Δ used in different methods (i.e. for different optimizers) in Flux.

For example: in Descent, the meaning of Δ seems to be:
[∂f/∂θᵢ] i=1,2,3,…

While looking at the meaning of Δ in AdaGrad, the meaning of AdaGrad takes time into account as well (if I understood AdaGrad correctly), then Δ mathematically means a diagonal matrix where ith element is:

Screen Shot 2021-01-04 at 1.19.54 PM

, considering that my understanding is correct. How is Δ given different (types) of values? Since its a part of the API, which is usually called by update! function, which eventually calls x -= Δ. I’m not able to wrap my head around as to how that operation (even when vectorised) is going to be a valid one?