Why Atomix.@atomic b[] += a[i] works and b[] = b[] + a[i] does not

Tomas_Pevny · December 18, 2025, 9:10am

Hi all,

I am updating my lecture on GPU programming and this year, I would like to have kernels written in CUDA.jl and KernelAbstractions.jl side by side. The idea is to show students, how GPU accelerators are similar and that with Julia ecosystem, they can write pretty general stuff.

I am working on a classic example of reduction, which @maleadt uses in his talks. But I have encoutered a weird behavior

Specifically this kernel works

using Metal, BenchmarkTools
using KernelAbstractions
import KernelAbstractions as KA
using Atomix

@kernel function reduce_atomic(op, a, b)
    i = @index(Global)
    Atomix.@atomic b[] += a[i]
end

x = rand(Float32, 1024, 1024);
cx = MtlArray(x);
backend = KA.get_backend(cx);
cb = MtlArray([0f0]);
reduce_atomic(backend, 64)(+, cx, cb, ndrange=size(cx))
Metal.GPUArraysCore.@allowscalar cb[]
sum(x)

while this does not


@kernel function reduce_atomic(op, a, b)
    i = @index(Global)
    Atomix.@atomic b[] = b[] + a[i]
end

x = rand(Float32, 1024, 1024);
cx = MtlArray(x);
backend = KA.get_backend(cx);
cb = MtlArray([0f0]);
reduce_atomic(backend, 64)(+, cx, cb, ndrange=size(cx))
Metal.GPUArraysCore.@allowscalar cb[]
sum(x)

Can anyone please help me to understand, what is going on?

vchuravy · December 18, 2025, 9:12am

The @atomic macro does a syntax analysis only on the operator separating the left hand from the right hand side.

So it’s semantically different to do += which we will turn into an atomic increment, and = which is just an atomic assignment.

The rhs expression will be evaluated independently.

Tomas_Pevny · December 18, 2025, 9:20am

That makes a lot of sense. Can you recommend please, how to write it correctly?

vchuravy · December 18, 2025, 9:35am

@kernel function reduce_atomic(op, a, b)
    i = @index(Global)
    Atomix.@atomic b[] += a[i]
end

This is correct, no?

Perhaps the confusion is that CUDA.@atomic is different from Atomix.@atomic and IIRC the CUDA version might support the other variant as well, but that is not portable.

Tomas_Pevny · December 18, 2025, 9:44am

This is correct, but it does uses op. What if op = max ?

I would be fine to know that CUDA.@atomic is more general and Atomix.@atomic supports only subset, but in a portable way. I just do not want to rule out a possible solution.

Thanks a lot for help!

vchuravy · December 18, 2025, 10:02am

It’s less that CUDA.@atomic is more general. It was implemented first, then we added Atomix to support CPU and other backends, and the Atomix design ended up influencing Base. @atomic

So the general syntax should actually be

@kernel function reduce_atomic(op, a, b)
    i = @index(Global)
    Atomix.@atomic b[] op a[i]
end

But I am unsure if that is generally supported by all backends currently

Tomas_Pevny · December 18, 2025, 10:48am

thanks a lot. I undersand

Topic		Replies	Views
Cannot manage to use CUDA.atomic_add! GPU cuda , atomic	4	110	June 30, 2025
Adding at specific CuArray position GPU question	6	239	May 6, 2024
How to use `@atomic` with CUDA? New to Julia cuda	1	559	October 5, 2020
Examples with atomic operations using CUDA.jl GPU	6	3029	September 15, 2020
Atomic operations stop working when upgrading to CUDA.jl version 3.3 GPU question	4	1096	June 17, 2021

Why Atomix.@atomic b[] += a[i] works and b[] = b[] + a[i] does not

Related topics