Why is accessing a typed global different from dereferencing a const Ref

I had expected these to be implemented the same way, just loading a value of the specified type from a memory address, but it appears that the typed global needs 2 different loads. Why is that? Doesn’t seem to be a performance difference, @btime and @benchmark vary around the same figures…

julia> const x = Ref(1.5)
Base.RefValue{Float64}(1.5)

julia> getx() = x[]
getx (generic function with 1 method)

julia> y::Float64 = 1.5
1.5

julia> gety() = y
gety (generic function with 1 method)

julia> @code_llvm getx()
;  @ REPL[20]:1 within `getx`
define double @julia_getx_225() #0 {
top:
; ┌ @ refvalue.jl:56 within `getindex`
; │┌ @ Base.jl:38 within `getproperty`
    %0 = load double, double* inttoptr (i64 140382054219008 to double*), align 256
; └└
  ret double %0
}

julia> @code_llvm gety()
;  @ REPL[21]:1 within `gety`
define double @julia_gety_227() #0 {
top:
  %0 = load atomic double*, double** inttoptr (i64 140382044328152 to double**) unordered, align 8
  %1 = load double, double* %0, align 8
  ret double %1
}
1 Like

Typed globals are thread-safe so the first load is a lock. When the lock isn’t contended they both should be pretty fast.

4 Likes

Is that why setx(v) = x[] = v and sety(v) = global y = v are much more different, too? In that case sety does seem slower.

1 Like

exactly.

1 Like

I had not expected the locking, didn’t run across this in any writings about typed globals. I wonder why performance was sacrificed for this feature, it’s not like local variables and untyped globals are thread-safe.

1 Like

I’m pretty sure untyped globals are type safe.

1 Like

I was excited at first and wrote about how happy I am about having thread-safe typed globals.

However, after running the following experiment, this doesn’t seem to be the case:

toggle::Bool = false
counter::Int = 0

function switch()
    global toggle
    global counter
    toggle = !toggle
    counter += 1
end

for _ in 1:1001
    Threads.@spawn switch()
end

# 12 threads: counter < 1001 (usually < 920) and ~50% true/false for toggle

Am I missing something?

They are thread safe in the sense that no two threads can write simultaneously. They are however not data race free, i.e. you can still have other different threads reading/writing in an interleaved/noncoordinated manner.

This distinction is mostly relevant for bigger structs that contain multiple objects - the thread safety of globals means that you won’t ever get a mix of two objects from different threads (so no one thread achieves a “partial write”), you’ll always end up with the whole object of some thread.

4 Likes

Thanks for the explanation. I didn’t make the distinction (e.g., in my mind expressing that there is thread safety implies that you can safely update the value from different threads).

But I get it now.

There is no locking going on here, thread safety only comes from the atomic reads. The difference lays in the way global bindings are implemented:

When assigning a value to a binding, a separate box is allocated for each new value and a module’s binding table then contains a pointer to that box. This adds an extra layer of indirection compared to constant refs, which always refer to the same location in memory so their memory address can be inlined in codegen.

This is potentially fixable by adding special handling for concretely typed globals, but that will require some careful thought. Fixing this is not a very high priority at the moment since code this performance sensitive should generally not be accessing global state anyways

7 Likes