What you show is only the address of the 1st element.
What I suggest is making sure any Loop with vectorization won’t have “Anomaly” to take care of.
I meant something like Intel IPP.
If we define 1D array it will be padded to have size which is multiplication of 16 / 32 / 64 Bytes.
If you define 2D array it will be padded with rows which are also multiplication of 16 / 32 / 64 Bytes.
This way all loops will be able to be unrolled and vectorized with no issues about taking care of edge cases.