@keno: This doesn’t make sense at all to me. Why would you want to use a function to copy bytes, when trying to deal with unaligned access (on only 1 platform! i.e. 32-bit ARM) in a function that normally just loads 64-bits into a register?
There are two ways of dealing with this correctly (which should only be done for any platform(s) requiring aligned access), the most efficient would be to use a compiler flag to indicate the access is unaligned (on MS compilers, that was
__unaligned, for the ARM compiler, apparently it is
When compiling C, variables are by default architecturally aligned. A global of type int (or uint32_t) will be 4-byte aligned in memory. Similarly, a pointer of type int* is expected to contain a 4-byte aligned address.
Where this is not the case (or may not be the case) the variable or pointer MUST be marked with the __packed keyword. This is a warning to the compiler that this variable, structure or pointer is potentially unaligned.
The other would be to simply define (only for platforms with this problems, i.e. ARM32!)
getblock64 to pick up the bytes individually if necessary (but the code could be made to better handle the cases where the data is sufficiently aligned, even on platforms requiring alignment).
Related but somewhat off-topic, it would be really nice if Julia had a
Ptr type that indicated to LLVM that the access was potentially unaligned, to handle these cases without compromising performance, that is something that I had to fake in