I ran into a similar problem using Optimization.jl and their similar OptimizationFunction interface.
I solved the problem by memoization. In essence, i defined a cache struct which I could pass with the parameter argument to the NonlinearFunction instance. Say when df is called, both f and df are calculated and the results are stored. When f is then subsequently called (checking equality of arguments) you can immediately retrieve the answer.
I’m on phone, but I can elaborate with a code example later.