Confused about how Zygote pullback handles tuple-valued loss functions

Zygote can handle only scalar loss function. Period. If you would like to take gradient of a function with two outputs, you are effectively computing Jacobian, which means that you need to take the gradient with respect to the first item, and then with the second.