Faster sum loop when looping through vector multiple times than once

I find that g is faster on my computer.
The difference between the performance might be that your CPU is able to run fewer instructions in parallel if they all write to the same index of f.
Modern CPUs are superscalar, running multiple instructions at a time per core. In g, you write to the same res[j] multiple iterations after each other. This prevents the CPU from doing multiple iterations at a time, since each iteration has to wait for the last one. This is not the case for f or h.

You can get even faster than this if you enable SIMD.