I tried to implement the xoshiro RNG for Base julia (cf https://github.com/JuliaLang/julia/issues/27614).
So, for this to work properly, I thought about using something like
const global_xorostate = zeros(UInt64, Threads.nthreads(), 5, 8)
In reality, that would wrap a ccall(:posix_memalign, ...)
: Each thread gets 5 cache lines to play with.
Now I am already lost:
julia> f()=global_xorostate[1]
f (generic function with 1 method)
julia> @code_typed f()
CodeInfo(
1 β %1 = invoke Base.getindex(Main.global_xorostate::Array{UInt64,3}, 1::Int64)::UInt64
βββ return %1
) => UInt64
julia> @code_native f()
.text
; β @ REPL[12]:1 within `f'
pushq %rax
movabsq $julia_getindex_16904, %rax
movabsq $139722737349536, %rdi # imm = 0x7F13BC2073A0
movl $1, %esi
callq *%rax
popq %rcx
retq
nop
This should be a single memory load from a known address, because the compiler should know that pointer_from_objref(global_xorostate)
cannot change, and it also should know that pointer(global_xorostate)
cannot change, nor the size (because it is not one-dimensional).
So, my question: How do I tell julia 1.4/master that it is perfectly fine to chase these pointers at compile time instead of runtime? How do I get rid of the invoke
?
Or should I try something else?
My problem with struct
is that I donβt know how to force 64 byte alignment. A secondary problem is that I donβt know how to avoid an additional indirection through Threads.threadid()
with structs. Threads.threadid()
should only be used as an offset for loads of payload, not as an offset to load a pointer to payload. (I mostly know the desired assembly code, and my issue is how to coax julia into emitting that)