@kristoffer.carlsson already provided a good concrete example, but maybe I can help provide a theoretical perspective for how changes in the chunk size affect performance.

Generally, raising the chunk size will reduce the number of calls that need to be made to the objective function at the cost of performing additional multiplications at each intermediate operation in the objective function. Raising the chunk size also changes the stack layout (as the extra epsilon components of the `Dual`

numbers are stack-allocated) and can thus increase stack pressure.

The chunk size “sweet spot” for any given function is going to strike a balance between minimizing the number of objective function calls without incurring “too much” multiplication overhead or thrashing the stack.

For example, a function composed of a few very cheap operation may benefit from a relatively lower chunk size, since the additional multiplications might be expensive relative to the cost of just calling the function.

Also keep in mind that the number of saved function calls is inversely related to the chunk size; halving the number of function calls if `N = 1`

requires jumping to `N = 2`

(costing one additional multiply-per-operation), while halving the number of function calls if `N = 5`

requires jumping to `N = 10`

(incurring 5 extra multiplies-per-operation for the same benefit).

Benchmarking the test functions in DiffBase with different chunk sizes might help build intuition for how different functions respond to different chunk sizes; some of those functions should be more sensitive than others.