Scattered Atomic Writes Into Array

cshenton · September 21, 2020, 9:53am

I’m porting some OpenCL code that does scattered atomic writes. These writes are quite sparse, meaning that threads rarely try to write cache lines at the same time.

I’m trying to figure out how to do this in Julia but am having some trouble. The atomic operations seem only to operate on individual boxed primitives, which is not very useful for me here, since I need to do atomic operations into an array.

Is there some workaround?

tisztamo · September 21, 2020, 12:07pm

I am not an expert so this is more like a question:

My first idea is to create a smaller array of locks and mapping each lock to a continuous area in the original array. Would that be highly suboptimal?

cshenton · September 21, 2020, 12:12pm

That would certainly be a reasonable thing to try. In this case I don’t have large amounts of contention over small areas of the array, so the cost of what you suggested would be proportional to the cost of the atomic operations, though a little more awkward with the extra resource management, etc.

That’s probably what I’ll do as a backup (or just let the scatter part of the code be single threaded).

foobar_lv2 · September 21, 2020, 12:30pm

I am assuming you are on x86_64?

Good news is that writes of 1-8 bytes are always atomic. Bad news is that atomic writes of larger things is unsupported (needs some locking structure [Edit: So x86_64 does have cmpxchg16b. TIL]). This also means that atomic ops on structs that are larger than 8 bytes are unsupported on your hardware.

You probably know this, and actually need things like atomic_add, atomic_rmw and atomic_cas on e.g. Ptr{UInt64} extracted from pointer(some_array, index)?

In that case, you should take a look at https://github.com/JuliaLang/julia/blob/master/base/atomics.jl

Base only defines atomic operations for the boxed primitives – but you can just extend them to Ptr{your_needed_primitive} (copy-paste the code with minimal adjustments). Yes, this is type piracy; but this is imo the OK kind of type piracy (there is only one canonical definition that makes sense).

In case that the macro-heavy code from Base is too annoying to follow: You want e.g.

julia> Threads.atomic_add!(p::Ptr{UInt64}, v::UInt64) = Core.Intrinsics.llvmcall("""%ptr = inttoptr i64 %0 to i64*
       %rv = atomicrmw add i64* %ptr, i64 %1 acq_rel
       ret i64 %rv""",
       UInt64,
       Tuple{UInt64, UInt64},
       reinterpret(UInt64, p), v)
julia> a=[UInt(4), UInt64(5)]
julia> Threads.atomic_add!(pointer(a,2), a[1])
0x0000000000000005

If you dislike type piracy, then just call your function my_atomic_add!.

cshenton · September 21, 2020, 12:48pm

Sorry, my original phrasing may have been unclear. I want to atomically increment Float32 elements in an array, so it’s a read-modify-write, not just a write.

Thanks for the link. A bit of type piracy is alright by me. I might just try and manually unroll the templated llvmcalls in that file. Hopefully once all the multithreading APIs settle down there will at least be versions of these functions defined for Ptr types.

cshenton · September 21, 2020, 2:11pm

Oh I’ve just seen your edit. Thanks! That clarifies things a lot for me.

foobar_lv2 · September 21, 2020, 3:41pm

I forgot the second relevant link: LLVM Language Reference Manual — LLVM 16.0.0git documentation

Also https://github.com/JuliaLang/julia/issues/32455

I agree that this is a shortcoming in the exposed API. Feel free to comment on the issue on github!

Someone there or here on discourse (maybe myself, or yourself) is likely to submit a PR if there is enough popular demand

PS: https://github.com/JuliaLang/julia/pull/37683

Topic		Replies	Views
Is it possibly to do atomic update on an array element? General Usage	14	690	March 26, 2024
Atomic{T} types boxing and performance Internals & Design	3	1270	February 27, 2017
Vector of Atomics General Usage multithreading , atomic	3	1133	August 13, 2021
Advice on using unsafe_wrap to view integers as atomic integers General Usage question	16	1206	September 28, 2021
Compare and swap operation on mmap/shared-memory General Usage	7	737	January 30, 2023

Scattered Atomic Writes Into Array

Related topics