One often needs to update immutables inplace. For example:
immutable footype_imm
c1:: Int8
c2:: Int8
c3:: Int8
c4:: Int8
c5_8::Int32
c9_16::Int64
end
function setfoo(A, idx, val) @inbounds begin
av = A[idx];
A[idx] = footype_imm(val, av.c2, av.c3, av.c4, av.c5_8, av.c9_16);
return 0;
end end
code_native(setfoo, (Vector{footype_imm},Int64,Int8))
.text
Filename:...
pushq %rbp
movq %rsp, %rbp
Source line: 11
shlq $4, %rsi
movq (%rdi), %rax
Source line: 12
movb %dl, -16(%rax,%rsi)
Source line: 13
xorl %eax, %eax
popq %rbp
retq
nopw %cs:(%rax,%rax)
So we see that Julia/LLVM is clever about this: It only writes the updated byte, and does not read anything, hence not stalling on the unneeded memory read.
This is very good, but has different multithreaded semantics from the alternative strategy of reading all the fields, and writing them again.
Sometimes llvm merges multiple writes into a single one (e.g. 4 movb become one movl); this is very good, because it is faster. It has (slightly, rarely) different multithreaded semantics, though.
For these reasons, and 99.9% of code not caring about these details, I think it is actually quite a good idea that the multithreaded semantics of such updates of immutables appears to be undefined (this should maybe get documentation, though?).
I wanted to ask whether there is a way of getting well-defined multithreaded semantics in the remaining 0.1% of code where this matters.
The most important thing would be a way to update only certain fields in-place: That is, a way of writing “setfoo” such that the observed multithreaded semantic is guaranteed and not just maybe compiler-optimized.
Say, I have multiple threads operating on the same array of bitstypes; where different threads update different fields. It feels somehow dangerous to rely on this compiler optimization for correctness of my code. Especially since I fear the day where LLVM/julia becomes so smart that it figures out that it can broaden 7 movb into 1 movq – which is a very good idea in most single-threaded code – but will totally break the semantics of multithreaded code working on this assumption.
…ok, sure, there is a way involving unsafe_store! and pointer arithmetic, which generates the same native code, and should be safe (I guess? as long as I create barriers with @noinline to prevent the optimizer from becoming too smart?) from overzealous llvm optimizations. This is, however, amazingly inconvenient to use.