If your array is larger 2KB, it will be automatically 64-byte aligned, currently (and I think for the forseeable future).
Otherwise, you can just ccall
an aligned malloc
for your system and unsafe_wrap
with own=true
.
The only thing you don’t get this way, for arrays of sizeof(A) < 2KB is the nice inline layout where the data starts one cache-line after the header (which is good for the prefetcher when iterating from the start). But for arrays >16KB you don’t get this nice arrangement anyway, and I’d guess that is your primary usecase.