which is fair, but base points at some device memory that doesn’t support 64-bit reads. we must do 2 32-bit reads instead and combine results as prescribed.
optimization is not illegal if base points at your conventional RAM. In my case it points at memory-mapped device and optimized code doesn’t produce the same result.
in c/c++ you’d typically use #pragma optimize( "", off ) or something along these lines
if the device is memory-mapped, then what does it mean to say “it doesn’t support 64-bit read” if your host Arch/OS is 64-bit?
if the memory-map works correctly (which is OS’s job, and then your device driver’s job?), it will be able to handle whatever OS “read” call correctly?
If you’re DYI/hacking together a memory management something, then you still shouldn’t rely on “let’s turn off optimization and hope it compiles to this specific thing”. In this case you may want to use LLVM.jl directly and handcraft exactly what you want
That doesn’t sound legal, the OS should’ve mapped your device to somewhere in the 64bit address space, unless you are using a 32 bit only program for 64 bit julia, but I guess if that were the case things would’ve gone wrong before.
Hi @green.nsk, I’ve actually has the same situation arise when reading from memory mapped hardware performance counter registers. My solution at that time was to put the pointer loads behind functions marked as @noinline to prevent the read coalescing. With Julia 1.8, callsites can be annotated as @noinline, which would look something like this:
It’s not ideal because it uses a whole function call just to load a 32-bit integer (i.e. execute a single instruction), but hopefully helps fix the issue.
I think you’re looking for the equivalent to Cs volatile. Julia itself doesn’t expose those semantics, but this should work for your case:
function volatile_load(x::Ptr{UInt32})
@inline
return Base.llvmcall(
"""
%ptr = inttoptr i64 %0 to i32*
%val = load volatile i32, i32* %ptr, align 1
ret i32 %val
""",
UInt32,
Tuple{Ptr{UInt32}},
x
)
end
, assuming you’re on a 64 bit machine (32 bit would have you do inttoptr i32 instead). You may have to adjust that in a future version due to opaque pointers, to do load volatile i32, ptr %ptr, align 1 instead. It’s used like
though I’m not 100% sure about the +1 business there, since raw unsafe_load already assumes i starts at 1 (i.e. it takes the 1-indexed conversion into account already). I wrote the above assuming you want offset to be -1-based.
unsafe_load(p::Ptr{T}, i::Integer=1)
Load a value of type T from the address of the ith element (1-indexed) starting at p. This is equivalent to the C expression p[i-1].