Advice on using unsafe_wrap to view integers as atomic integers

I’m trying to view the same memory region as a vector of different types of the same size. The goal is to view a (large) vector of integers as a vector of atomic integers (counts), to allow parallel threads to perform atomic operations to collect the value of these integers (loop on some very large data and increment the relevant counts - this is sparse enough that using atomics should make sense), then once the collection is done, use the result (total counts) as a normal vector of integers for further processing. Here is sample code which segfaults using Julia 1.6.1 - what am I doing wrong?

using Base.Threads

size = 10  # The value doesn't matter to the behavior below as long as it is positive.

# Start with a vector of integers (Int32).
vector_of_int = zeros(Int32, size)
println("vector_of_int = $(vector_of_int)")

# Obtain a pointer to the first element.
pointer_to_int = pointer(vector_of_int)
println("pointer_to_int = $(pointer_to_int)")

# Sanity test - viewing integers as floats.

# Reinterpret the pointer to an integer as a pointer to a float (Float32).
pointer_to_float = reinterpret(Ptr{Float32}, pointer_to_int)
println("pointer_to_float = $(pointer_to_float)")

# Wrap the pointer in a vector of floats.
wrap_of_float = unsafe_wrap(Vector{Float32}, pointer_to_float, (size,))
println("wrap_of_float = $(wrap_of_float)")

# Demonstrates this works by setting a value through the floats view.
wrap_of_float[1] = 1.0
println("set! wrap_of_float = $(wrap_of_float)")

# Show that this modified the original integers vector.
println("vector_of_int = $(vector_of_int)")

# So far, so good. Now for what doesn't work...

# Obtain a pointer to an atomic integer.
pointer_to_atomic_int = reinterpret(Ptr{Atomic{Int32}}, pointer_to_int)
println("pointer_to_atomic_int = $(pointer_to_atomic_int)")

# Wrap the pointer in a vector of atomic integers: SEGFAULT - WHY?
wrap_of_atomic_int = unsafe_wrap(Vector{Atomic{Int32}}, pointer_to_atomic_int, (size,))
println("wrap_of_atomic_int = $(wrap_of_atomic_int)")

# Goal of the exercise - be able to perform atomic operations (in parallel threads).
atomic_add!(wrap_of_atomic_int[1], 1)

# Show that this modified the original integers vector.
println("vector_of_int = $(vector_of_int)")

This produces the output:

vector_of_int = Int32[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
pointer_to_int = Ptr{Int32} @0x00007f916ec5a5d0
pointer_to_float = Ptr{Float32} @0x00007f916ec5a5d0
wrap_of_float = Float32[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
set! wrap_of_float = Float32[1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
vector_of_int = Int32[1065353216, 0, 0, 0, 0, 0, 0, 0, 0, 0]
pointer_to_atomic_int = Ptr{Atomic{Int32}} @0x00007f916ec5a5d0

signal (11): Segmentation fault
in expression starting at /home/obk/Test.jl:48
unknown function (ip: 0x7f9172e4f2b2)
show_default at ./show.jl:395 [inlined]
show at ./show.jl:390 [inlined]
show_delim_array at ./show.jl:1092
show_delim_array at ./show.jl:1081 [inlined]
show_vector at ./arrayshow.jl:490
show_vector at ./arrayshow.jl:475 [inlined]
show at ./arrayshow.jl:446 [inlined]
print at ./strings/io.jl:35
unknown function (ip: 0x7f917304a3de)
string at ./strings/io.jl:174
unknown function (ip: 0x7f9184a6ee15)
unknown function (ip: 0x7f9184a6e9d5)
unknown function (ip: 0x7f9184a6f41b)
unknown function (ip: 0x7f9184a700d0)
unknown function (ip: 0x7f9184a8b457)
unknown function (ip: 0x7f9184a8c0fc)
jl_toplevel_eval_in at /usr/bin/../lib/julia/ (unknown line)
unknown function (ip: 0x7f9172fdd9f7)
unknown function (ip: 0x7f9173035338)
unknown function (ip: 0x7f9173034db2)
unknown function (ip: 0x7f9172d686cc)
unknown function (ip: 0x7f9172d6adec)
unknown function (ip: 0x7f9172d6af55)
unknown function (ip: 0x7f9184aae76e)
repl_entrypoint at /usr/bin/../lib/julia/ (unknown line)
main at julia (unknown line)
__libc_start_main at /usr/bin/../lib/ (unknown line)
_start at julia (unknown line)
Allocations: 1498826 (Pool: 1498247; Big: 579); GC: 2

If you want to manipulate integers in an array atomically, see TSXPlayground.jl/UnsafeAtomics.jl at master · tkf/TSXPlayground.jl · GitHub and its usage. Not sure how much of this is “legal” though.

unsafe_wrap(Vector{Atomic{Int32}}, pointer(::Vector{Int32}), _) doesn’t work because it means interpreting two consecutive Int32s as a pointer to the memory location of the mutable struct Atomic{Int32} (assuming 64 bit machine).

1 Like

TSXPlayground is awesome! But it is x86 specific, right?

I don’t understand “interpreting two consecutive Int32 s as a pointer to the memory location of the mutable struct Atomic{Int32} (assuming 64 bit machine).”. Note that sizeof(Atomic{Int3}) == sizeof(Int32) == 4 == sizeof(Float32) so if the code works for casting a vector of Int32 to a vector of Float32 why doesn’t it work when casting to a vector of Atomic{Int32}?

julia> Core.Compiler.allocatedinline(Int32)

julia> Core.Compiler.allocatedinline(Threads.Atomic{Int32})
1 Like

The TSX part is Intel specific. But UnsafeAtomics.jl part is only using vanilla LLVM (it’s mostly based on how Threads.Atomics is defined). I don’t think I’ve tested it on non-Intel machine, though.

Also, adding to Kristoffer’s point, maybe a simple demo helps:

julia> a = [Threads.Atomic{Int32}(12345)];

julia> unsafe_load(unsafe_load(Ptr{Ptr{Int32}}(pointer(a, 1))))

julia> b = [12345];

julia> unsafe_load(Ptr{Int32}(pointer(b, 1)))
1 Like

Looking at the implementation in julia/atomics.jl at 1b93d53fc4bb59350ada898038ed4de2994cce33 · JuliaLang/julia · GitHub it seems Atomic{T} is just a simple normal struct with a single member of type T, so the memory layout of Atomic{T} and T “should” be the same. This would also be consistent with the way that atomic types are implemented in most other languages (C++, Rust, etc.).

However, obviously this reasoning is wrong or my original code would work. Trying Search · The Julia Language gives empty results; @kristoffer.carlsson, could you please elaborate on how allocatedinline works under the hood? EDIT: I see from @tkf’s reply that there’s an additional level of indirection added (that is, a vector of Atomic{Int32} is actually a vector of 8-byte pointers to 4-byte values. I don’t see why that should be - is there a good reference explaining this?

More generally: Is there any way in Julia to take a normal integer (in a vector) and view it as an atomic integer (that is, perform atomic_add! on it) using any combination of Vector, Ptr, pointer_from_objref, pointer, reinterpret, convert, unsafe_convert or similar? All the combinations I tried so far either (1) failed to run, (2) gave the wrong semantics since atomic_add! ended up applying to an independent copy of the original integer, or (3) crashed mysteriously (mysteriously for me, that is).

Also - @tkf - was I hasty in saying that TSXPlayground.jl/UnsafeAtomics.jl at master · tkf/TSXPlayground.jl · GitHub is X86-specific? I jumped to this conclusion because I saw what seemed to be assembly instructions, but in retrospect, these might be LLVM-IR instructions, which are not specific to a particular CPU architecture. If so, then I can adapt the code there into mine (all I need is the atomic_add! operation). If so, a question - is it safe to use the more efficient TSX variant - specifically, would it work on Apple M1 (ARM architecture) as well as X86? If so, does the performance gap in favor of TSX apply in that CPU architecture as well? EDIT: Ugh, cross-posted w/ your reply. I see it is LLVM-IR and that the atomics part “should” be portable while the TSX part is X86-specific. Thanks!

It is mutable so the layout is not the same:

julia> a = Threads.Atomic{Int}(2);

julia> b = [a, a];

julia> a[] = 3

julia> b
2-element Vector{Base.Threads.Atomic{Int64}}:

There is a whole new atomic API coming in 1.7 that might interest you. There was a JuliaCon presentation about it:

1 Like

@kristoffer.carlsson - thanks for the link to the presentation and for the explanation.

I suppose that whenever I have a mutable struct Foo ... end, then a Vector{Foo} will actually be implemented as an array of pointers to heap-allocated Foo objects rather than directly as an array of consecutive Foo objects. That would be consistent with Java, Python etc. and makes sense for a GC language. My only surprise is that such a mutable struct was chosen to implement Atomic{T} - definitely not what I expected given the zero-cost implementation in C/C++/Rust/etc.

Hopefully @atomic would allow for this zero-cost abstraction (though that wasn’t explicitly stated in the talk).

At least now all is clear - many thanks!

If the Atomic itself would be immutable, you’d have a atomic reference. The current Atomic should be able to change its value though, so it has to be mutable.

@Sukera - I understand the reasoning, it does seem inevitable when viewing Atomic{T} as a struct. Ideally, it would have been a primitive type instead, which would have side-stepped the issue, but that would have been way less convenient to implement. Actually, ideally there wouldn’t be an Atomic data type at all, instead the atomic operations would work directly on Ptr{T} or Ref{T} and be done, which would have been easier to implement and more useful, but that’s a whole different discussion…

That’s more or less what will be available in 1.7. Just without having to do Ref and Ptr manually.

Very much debatable! There was a lot of consideration and thought going into how to do this for 1.7, check out the atomics manifesto for more info. Also, getting that support in the language required quite a bit of code, so I definitely don’t think it would have been easier than the “it’s a mutable struct” solution (which you also won’t get rid of entirely - having a vector of individual atomic elements still requires that kind of wrapper).

@Sukera - Debatable indeed. I think it is a matter of the requirements. The key point seems to be: “All writes to a field declared as atomic should only be done with compatible atomic memory orderings. Mixing and matching between atomic and non-atomic specifications on a memory location is prohibited (except in the constructor, where all writes are done non-atomically).”

My problem is that my use case directly contradicts this requirement. I agree that given this requirement, the decisions made are reasonable. My argument is that that this requirement is overly restrictive - there are valid use cases where one very much wants to mix a phase of atomic operations with a following phase of normal operations on the same fields/data. And if one removes this requirement, it makes sense to discuss the option of simply providing a series of functions atomic_do_something!(::Union{Ref{T},Ptr{T}}, ...).

That said, I agree that in other cases this requirement makes a lot of sense and supporting it is important. I guess that in an ideal world we’d have both? An @atomic for atomic-only-operations and something like the UnsafeAtomics.jl module for the “sometimes atomic operations, sometimes normal operations” use cases - which one can view as “unsafe”.

Perhaps what I’m really asking for here is for @tkf to publish UnsafeAtomics.jl as a proper Julia package. BTW - I just tested and I get a definitely sub-linear acceleration, but still a significant speedup using this module. Many thanks for this! Please publish it!

1 Like

The problem with this mixing of atomic/non-atomic operations on the same types/data is that you loose all benefit of statically ensured atomicness in the first place. If you allow both to happen in principle on the same data, you open yourself/your code up to the possibility of some non-atomic code running during the atomic portion, making a mess of things (which doesn’t have to be initiated by you, even if the code you’ve written would not mess with things. A user library loading your code could just as well mess with the data in an unsafe manner). The solution here is to seperate these two concepts: Work on atomic types when requiring atomics and moving the data to non-atomic types when not requiring them. I suspect this moving is more expensive than just keeping the atomics around though.

There is a sort of middle ground in the form of lock-free programming, but that only applies to algorithms, not data as you’re proposing.

“The problem with this mixing of atomic/non-atomic operations on the same types/data is that you loose all benefit of statically ensured atomicness in the first place.” - agreed, hence the name of the package is UnsafeAtomics.jl. Caveat emptor, use with care, etc.

That said, if you take care to add a fence in the right places, then you can (manually) ensure that the code is safe. E.g. if all of this is buried inside some function’s code ...; use_unsafe_atomics(); atomic_fence(); ... then the function is safe to the callers.

Sure, copying all the data between the atomic-only and the normal types is possible, but in my case (large vectors) this would be painful (performance is the name of the game here after all). Especially using the existing Atomic types which require a heap allocation for each value (shudder) - but even if/when @atomics avoids that, it would still defeat the purpose of the whole thing.

It looks like there’s Arm has Transactional Memory Extension (TME) intrinsics
and LLVM supports it and so maybe it’s possible to implement a similar algorithm.

(BTW, a nitpick: TSX is Intel-specific rather than x86-specific. AMD has/had different proposal Advanced Synchronization Facility - Wikipedia but it looks like it’s not implemented.)

The same memory location can be used for atomic and non-atomic access in the C/C++ memory model (which is the basis of Julia’s memory model) as long as there is another mechanism to ensuring ordering. Indeed, C++20 has std::atomic_ref - which can be used for atomics operations on locations declared non-atomic. However, as orenbenkiki pointed out, the programmer is responsible for ensuring that at each phase of the program, the access is only atomic or only non-atomic.

It seems that dealing with array was the one of the motivations of std::atomic_ref 3.2. Atomic Operations on Members of a Very Large Array - p0019R8: Atomic Ref

Also, just merely declaring the location atomic is usually not enough. Mix of orderings have a similar issue. For example, interestingly, you can break previously well-behaving program by adding relaxed memory access: C++Now 2018: Tony Van Eerd “The Continuing Saga of the Lock-free Queue: Part 3 of N” - YouTube (see comment around 1:13:02). It says “probably UB” but this is now UB in C+20 memory model IIUC.

Also, a related issue:

FYI, not registered yet, but I extracted it out as GitHub - JuliaConcurrent/UnsafeAtomics.jl and used it for GitHub - JuliaConcurrent/AtomicArrays.jl (Now all I need to bite the bullet and register it… Not sure I want to :laughing:)