Why is SUM always used in differentiation?

Sunny · March 19, 2021, 5:41pm

I’ve seen a lot of examples where the output of a NN or any function during the differentiation is preprocessed with sum operator. For example this:

julia> hessian(x -> sum(x.^3), [1 2; 3 4])  # uses linear indexing of x
4×4 Array{$Int,2}:
 6   0   0   0
 0  18   0   0
 0   0  12   0
 0   0   0  24

Is this a julia thing or there some math behind it?

mcabbott · March 19, 2021, 5:50pm

It’s just a short way to make a function which returns a scalar. You will get an error with functions which return an array:

julia> gradient(x -> x.^3, [1 2; 3 4])
ERROR: output an array, so the gradient is not defined. Perhaps you wanted jacobian.
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:33

julia> jacobian(x -> x.^3, [1 2; 3 4])[1]
4×4 Matrix{Int64}:
 3   0   0   0
 0  27   0   0
 0   0  12   0
 0   0   0  48

julia> gradient(x -> sum(x.^3), [1 2; 3 4])[1]
2×2 Matrix{Int64}:
  3  12
 27  48

Sunny · March 19, 2021, 6:09pm

Great answer, thank you!

apo383 · March 19, 2021, 6:54pm

To expand a bit, a neural network or other optimization usually needs to be trained to minimize a scalar objective function. So you usually want to define the problem with a scalar in the first place. In fact, the term gradient usually refers to the partial derivative of a scalar function with respect to one or more variables, and jacobian (in @mcabbott’s example) to the partial derivative of a vector function. There are plenty of uses for Jacobians, but gradients are more common in optimization, which is one of the most popular uses for AD.

I consider sum to be more than a way to get Julia not to error, but a key part of defining a sensible problem to be solved.

Topic		Replies	Views
Gradient of Gradient in Zygote General Usage	2	2736	January 1, 2021
Jacobian of NN: Mutating arrays is not supported Machine Learning	2	379	February 11, 2021
Finding Jacobian using automatic differentiation for a vector function General Usage forwarddiff , reversediff	3	1713	December 14, 2021
Jacobian of a (multivariate) function New to Julia	10	7480	May 9, 2020
Jacobian of a multi-inputs function New to Julia	9	1026	January 31, 2020

Why is SUM always used in differentiation?

Related topics