ForwardDiff with Matrix of Vector as input argument

Hardly changes

287.826 ns (1 allocation: 896 bytes) # problem size 10 
24.666 μs (4 allocations: 79.08 KiB) # problem size 100 
2.399 ms (2 allocations: 7.63 MiB) # problem size 1000