To clarify, “reverse mode” AD is efficient when you have a functions f(x) with small number of outputs f_i and many inputs x_j (in computing \partial f_i/\partial x_j), i.e. for functions mapping x\in\mathbb{R}^m to f \in \mathbb{R}^n with n \ll m. (For example, in neural-network training where you want the derivative of one loss function (n=1) with respect to millions (m) of network parameters. (The “manual” application of such a technique is also known as an adjoint method, and in the neural-net case it is called backpropagation.)
In contrast, forward-mode AD (as in ForwardDiff.jl) is better when there is a small number of inputs and a large number of outputs, i.e. when n \gg m, i.e. when you are computing many functions of a few variables. (It essentially corresponds to “manual” application of the chain rule in the most obvious way.)