I mostly work with obscure image formats like lossless JPEG, but recently I needed ImageIO to load some other types of images for a training data pipeline. It seems to return images with fixed point types, and all of my own Julia image processing code just uses UInt8 / UInt16. I find that for my particular use case, its not worth the trouble of having to deal with overflows when my image processing routines can just convert each element to floating point and back in the inner loop with similar performance.
There seems to be a chicken and egg problem converting FixedPointNumber based arrays back to Julia base types, because when you reinterpret(UInt8, ::Matrix{N0f8}) you don’t get back Matrix{UInt8}. This causes unnecessary additional compilation and specialization downstream even though its functionally equivalent.
Both potential workarounds to this have issues:
Do unsafe_wrap, but the problem there is if I let the fixed point array version go out of reference, it can be garbage collected and now my pointer is dangling. In order to store a reference to it, usually that means the wrapper will end up being specialized. Maybe I can use Any here, I have no idea what that may cause the compiler to do behind the scenes.
Admit defeat and copy the data
I ended up just copying the data, but is there an elegant solution to this problem I’m missing somewhere?
That particular unsafe_wrap issue can at least be worked around with manual caching. The worst issue with unsafe_wrap is mentioned in its docstring:
Unlike unsafe_load and unsafe_store!, the programmer is responsible also for ensuring that the underlying data is not accessed through two arrays of different element type, similar to the strict aliasing rule in C.
The language and compiler assumes we won’t read or write >1 type to the same buffer via any kind of reference; in some statically typed languages like C, the parallel is pointers with different types. Contrary to initial expectations of a buffer type simply being a linear sequence of bytes, v1.11’s Memory retains an element type in order to communicate this information to the compiler. We could still build an array type on top of Memory{UInt8} and aggressively reinterpret scalars to UInt8 chunks, and it’d sacrifice optimizations and add overheads the same way the reinterpret views do now.
This rule isn’t universal. Some C projects like the Linux kernel are compiled with an option that doesn’t assume strict aliasing; despite being undefined behavior and disabling some compiler optimizations, it can be more straightforward in low-level programming and is isolated from separately compiled binaries that do assume strict aliasing. LLVM can accommodate languages like Rust without the rule to begin with, though it’s worth mentioning that unsafe pointers are used much less than the references that obey Rust’s ownership to leverage similar compiler optimizations.