I am still having a bit of a hard time figuring out what flux.jl actually does. One important aspect seems to be making gradients available automatically via tracking. How does that differ from the approach in ForwardDiff.jl, i.e. why is it not possible/feasible/sensible to get the gradient of my loss with respect to all parameters using ForwardDiff?
The current tracking implementation seems to be very similar to what Tensorflow + eager execution does, right?