I am wondering about why llvm doesn’t optimize the following:
julia> mutable struct A
@atomic x::Int
end
julia> f(a) = (@atomic :acquire a.x) + (@atomic :acquire a.x);
julia> @code_llvm f(A(1))
; Function Signature: f(Main.A)
; @ REPL[6]:1 within `f`
define i64 @julia_f_1489(ptr noundef nonnull align 8 dereferenceable(8) %"a::A") #0 {
top:
; ┌ @ Base_compiler.jl:78 within `getproperty`
%"a::A.x" = load atomic i64, ptr %"a::A" acquire, align 8
%"a::A.x1" = load atomic i64, ptr %"a::A" acquire, align 8
; └
; ┌ @ int.jl:87 within `+`
%0 = add i64 %"a::A.x1", %"a::A.x"
ret i64 %0
; └
}
The two adjacent loads look obviously redundant – it should be possible to transform this into
f_optim(a) = begin tmp = @atomic :acquire a.x; tmp + tmp end
However, as far as I understand, none of llvm/clang, gcc, icc, msvc do this optimization.
So my questions are two-fold:
- Would that be an admissible optimization under the prevailing memory model?
- Why is that not done ???
Now, the actual thing I want is a threadsafe lazy value that can be hoisted out of loops.
That is, something like the following:
julia> mutable struct Lazy{T,Init}
const init::Init
@atomic hasrun::UInt8 #0: uninit, 1: running, 2: error, 3: ready
item::T #put something invalid here before init
end
julia> @inline function getLazy(lazy::Lazy)
if 3 != @atomic :acquire lazy.hasrun
@noinline initLazy(lazy) # will throw on error and spin/wait if necessary
@atomic :release lazy.hasrun = 3
end
return lazy.item
end
The internals of initLazy
should not matter at all: If I have a sequence
getLazy(lazy)
#something that llvm understands
getLazy(lazy)
then llvm should do its very best to reorder the second load @atomic :acquire lazy.hasrun
upwards, across the block #something
(this is allowed if the something-block doesn’t forbid it by its own barriers. If I didn’t want that I would have written @atomic :sequentially_consistent lazy.hasrun
), see that it can either forward from the store or the previous load, and dead-code eliminate the entire second getLazy
call, and forward the output of the first getLazy
.
But llvm doesn’t even do this for the simplest #something
-block, namely the empty one.
Am I insane? Can anyone link me to literature about that?