Alright, for posterity, and since the documentation for valloc
seems quite sparse, here’s the version I ended up with. Changing to use a pointer
, @simd
no longer makes a difference for me. I also optimized the iterator according to our recent discussion. Finally, with valloc
, there’s no need to take care of the remaining elements at the end, since there’s extra room at the end to write an additional vector element.
function test_nt!(a::SubArray, n)
v = Vec{4,Float64}((0, 1, 2, 3))
for i = 0:(n-1)>>2
vstorent(4i + 1 + v, pointer(a, 4i + 1))
end
end
a = valloc(Float64, 8, 199)
test_nt!(a, 199)