To expand a bit, a neural network or other optimization usually needs to be trained to minimize a scalar objective function. So you usually want to define the problem with a scalar in the first place. In fact, the term
gradient usually refers to the partial derivative of a scalar function with respect to one or more variables, and
jacobian (in @mcabbott’s example) to the partial derivative of a vector function. There are plenty of uses for Jacobians, but gradients are more common in optimization, which is one of the most popular uses for AD.
sum to be more than a way to get Julia not to error, but a key part of defining a sensible problem to be solved.