Why is SUM always used in differentiation?

It’s just a short way to make a function which returns a scalar. You will get an error with functions which return an array:

julia> gradient(x -> x.^3, [1 2; 3 4])
ERROR: output an array, so the gradient is not defined. Perhaps you wanted jacobian.
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:33

julia> jacobian(x -> x.^3, [1 2; 3 4])[1]
4×4 Matrix{Int64}:
 3   0   0   0
 0  27   0   0
 0   0  12   0
 0   0   0  48

julia> gradient(x -> sum(x.^3), [1 2; 3 4])[1]
2×2 Matrix{Int64}:
  3  12
 27  48
6 Likes