fortunately, it did not matter… ;-).
below is corrected:
loop1 1.125 ms (79491 allocations: 2.82 MiB)
loop2 84.387 μs (103 allocations: 322.83 KiB)
loop3 14.132 μs (2 allocations: 234.45 KiB) <-- steveng's loop
loop3 1.089 ms (79491 allocations: 2.82 MiB) <-- steveng's loop, but with .= instead of direct
rv 26.252 ms (215531 allocations: 81.67 MiB)
sa 87.650 μs (20 allocations: 769.02 KiB) <-- corrected
ra 157.509 μs (85 allocations: 772.58 KiB)
rh 22.710 ms (163682 allocations: 80.42 MiB)