Hazard pointers - in C++26 vs Julia - concurrency/atomics e.g. for UInt128

I hadn’t seen this concept, and wanted to let others know, and ask how it contrasts to Julia, but to answer my own question:

In a multithreaded computing environment, hazard pointers are one approach to solving the problems posed by dynamic memory management of the nodes in a lock-free data structure. These problems generally arise only in environments that don’t have automatic garbage collection.

So Julia doesn’t need it, nor likely much of the new support in C++20, 23 and coming in 26. But you CAN program in Julia at a lower level, avoiding the GC, and then you might…

One way to deal with concurrency is to only use immutables, e.g. ruling out the default Dict. but recent HAMT implementations helps with that. With no mutables (that also includes the default mutable arrays/vectors, but not default strings) there are NO bug or security issues regarding concurrency.

Running Julia as default, i.e. not with -t auto, or some number other than 1, avoids some bug issues. But concurrency is frequently wanted for speed.

Then you need to lock and/or use atomic access. Types up to U/Int64 allow atomic access with the default alignment, but it’s not clear to me with e.g. UInt128. @Elrod On all modern cached CPUs, I believe no problem, assuming the type doesn’t split across cache lines, but it doesn’t have 128-bit alignment, only 64-bit. I think that decision for Julia was done to not waste space, emulating other languages such as C++. It’s been a little hard for me to dig into if it’s atomic in Julia or other languages, as it depends on the CPU more than the lang spec. ChatGPT has npot been very helpful, basically “it depends” on the CPU. But I want a guarantee (for all (modern) CPU, and not just x86. Though even just knowing for most common, e.g. x86 and ARM would be nice.

Is the rule that you are unsafe for larger types/structs? Without explicit locking? I want to know for without, or “lock-free”. Note, I think BigFloat/BigInt are actually safe, since hidden behind a (no large-than) 64-bit pointer.

E.g. a rational of Int64, the default Rational is likely not safe. I’m asking because I’m implementing a new one, SIMD rational…

To motivate the use of hazard pointers, let us consider a problem: we want to implement a concurrent key-value map that satisfy the Write-Rarely-Read-Many (WRRM) property. WRRM maps are commonly used as caches, and in use cases such as a lookup table for currency to exchange rates where the map is looked up many times but only updated at a comparatively much slower rate.

In other words, we want to implement a multithreaded-access map interface, using a single threaded map object such as std::map.

1.4 Rationale for inclusion in C++26
This proposal is based on the experience gained from hazard pointer implementation in the Facebook Folly open-source library (since 2016) and its heavy use in production (since 2017)

Why in std:
https://www.reddit.com/r/cpp/comments/sywjf3/p2530_hazard_pointers_belong_in_the_standard/

I’m nominally a coauthor, although just because of my work on the original hazard pointer paper. The paper gives a pitch for “sooner rather than later”, but doesn’t really emphasize the answer to “why include it” […]

It was a bit too much for me to start a 5 hour video “Implementing Hazard Pointers in Rust” on a language I don’t (currently) use:

“Brief” introduction:

Threads.Atomic{UInt128} has 16 byte alignment:

julia> x = Threads.Atomic{UInt128}(4);

julia> @cl x[]
julia> @code_llvm debuginfo = :none x[]
define void @julia_getindex_306(i128* noalias nocapture noundef nonnull sret(i128) align 8 dereferenceable(16) %0, {}* noundef nonnull align 8 dereferenceable(16) %1) #0 {
top:
  %2 = bitcast {}* %1 to i128*
  %rv.i = load atomic i128, i128* %2 acquire, align 16
  store i128 %rv.i, i128* %0, align 8
  ret void
}

so our load here is align 16.

If you mark a field @atomic, the mutable struct will be padded appropriately:

julia> mutable struct Foo
           x::Int
           @atomic y::UInt128
           z::Int
       end

julia> mutable struct Bar
           x::Int
           y::UInt128
           z::Int
       end

julia> sizeof(Foo), sizeof(Bar)
(48, 32)

but the Foo struct itself still has 8-byte alignment instead of 16, defeating the benefit of padding it.
Thus, loading requires a lock/unlock:

julia> gety(x::Foo) = x.y
gety (generic function with 1 method)

julia> @cl gety(Foo(1,1,1))
julia> @code_llvm debuginfo = :none gety(Foo(1, 1, 1))
define void @julia_gety_393(i128* noalias nocapture noundef nonnull sret(i128) align 8 dereferenceable(16) %0, {}* noundef nonnull align 8 dereferenceable(48) %1) #0 {
top:
  %2 = bitcast {}* %1 to i8*
  %3 = getelementptr inbounds i8, i8* %2, i64 24
  call void @jl_lock_value({}* %1)
  %4 = bitcast i8* %3 to i128*
  %5 = load i128, i128* %4, align 8
  call void @jl_unlock_value({}* %1)
  store i128 %5, i128* %0, align 8
  ret void
}

instead of a single atomic operation.

I would suggest sticking with the primitives, and checking generated code as I did above to get an idea whether codegen was good.
You could file an issue over the alignment of Foo being too small to allow efficient atomic access (or over any other cases with suboptimal codegen you encounter).

1 Like

I usually trust you since you’re the expert, but I saw:

8

So it only seems to be the “minimum” guaranteed limit. Maybe 16 IS guaranteed on x86, however, but I want that guarantee always(?). Do you think I always have it despite that, or maybe that don’t need it?

It should be 16-byte aligned (by default), but I’m not sure it is/guaranteed on all platforms:

One reasons is that most SSE2 instructions on X86 require the data to be 128 bit aligned. This design decision would have been made for performance reasons and to avoid overly complex (and hence slow and big) hardware.

FYI, existing on x86:
https://www.felixcloutier.com/x86/lddqu

LDDQU — Load Unaligned Integer 128 Bits

I do see in Julia.h (implying it might not always be):

// int128 is 16 bytes aligned on aarch64

In jl_compute_field_offsets (MAX_ALIGN is only 8):

It seems to be fixed in Zig, but not in Rust (any why not for a long time? Seems like a simple fix):

u128/i128 are not FFI-safe when they should be. That’s a bug. That’s why this issue exists and is still open. We also don’t claim that they are FFI safe, so it’s not a soundness bug.

@cmpxchg on x86_64 supports u128 integers. However @alignOf(u128) reports 8, and without align(16) on the u128, the code segfaults at runtime.

[Now fixed, but if not in Julia, then would also segfault, if using that instruction, so since it’s not likely actually is 16-byte aligned. I suppose that instruction IS used.]

Would you say all structs should be 16 bytes (no larger), 8, 4 etc. for concurrent programs (or use locking)?

Arrays are 64-byte aligned, if large, otherwise only 16-byte aligned, so larger 17, or 32byte has the possibility of straddling cache-line boundary.

If you can obtain a pointer from somewhere, you can use unsafe_wrap to create a Julia array. I created ArrayAllocators.jl to play with the concept.

https://mkitti.github.io/ArrayAllocators.jl/stable/api/#ArrayAllocators.MemAlign

julia> using ArrayAllocators

julia> A = Array{Int}(MemAlign(2^16), 16);

julia> pointer(A)
Ptr{Int64} @0x00000000032e0000

julia> B = Array{Int}(MemAlign(2^8), 16);

julia> pointer(B)
Ptr{Int64} @0x00000000025c1100

julia> C = Array{Int}(MemAlign(2^4), 16);

julia> pointer(C)
Ptr{Int64} @0x0000000000728fd0
2 Likes

FWIW, the failure here was on the C side, not on Julia’s side: we could not get the supported C compilers to correctly emit this 16 byte load across all of our supported targets. We are fairly close now though and could probably “flip the switch” on these

3 Likes