OK, so I uploaded the code I used for trying out atomic increments on array elements: https://github.com/tkf/ParallelIncrements.jl
(Comment added: I just realized that the name of the package is completely misleading since it was about single thread performance. Initially, I was thinking about trying it with multiple threads.)
This package contains a function ref = atomicref(::Array, indices...)
which can be used to for atomic operations like Threads.atomic_add!(ref, x)
.
As you can see in the README, the overhead is 10x in my laptop. It can go up to 100x and possibly more depending on the kind of optimization the compiler can do. It might be possible to get a lower overhead using other memory orderings but I haven’t tried it yet.
I’m not a big fan in this direction but I’m interested in whatever beneficial for HPC in Julia. So, I’m just sharing it just in case some people want to try it out (and maybe find some performance bugs I made in the sample code or something).