CUDA Float64 Float32 difference

andrey2185 · February 28, 2026, 3:40pm

Hello. I use RTX 3070. Why is there no difference?

using CUDA, BenchmarkTools
function add!(X::AbstractVector{T}) where T
    return X .+= T(1)
end
A = CUDA.zeros(Float32, 10_000_000);
B = CUDA.zeros(Float64, 10_000_000);
@btime add!(A);  # -> 5.250 μs (37 allocations: 1.09 KiB)
@btime add!(B);  # -> 5.267 μs (37 allocations: 1.09 KiB)

andrey2185 · February 28, 2026, 3:45pm

using CUDA, BenchmarkTools
function add!(X::AbstractVector{T}) where T
    return X .+= T(1)
end
A = CUDA.zeros(Float32, 10_000_000);
B = CUDA.zeros(Float64, 10_000_000);
@btime CUDA.@sync add!(A);  # -> 225.800 μs (91 allocations: 1.94 KiB)
@btime CUDA.@sync add!(B);  # -> 437.800 μs (77 allocations: 1.72 KiB)

Topic		Replies	Views
Why is my kernel as slow in FP32 as in FP64 on A2000 Ada-based GPU? New to Julia gpu , cuda , float , kernel , cudajl	10	349	March 11, 2025
Confusing performance of LinearAlgebra.mul! for Float64 GPU	4	387	January 23, 2024
CUDA perf on Int types General Usage	0	66	March 22, 2025
CUDA.jl - Better GPU but Worse Performance GPU gpu , performance	10	1789	June 29, 2022
Sum is very slow (and I can't figure out why) GPU	4	1002	January 4, 2021

CUDA Float64 Float32 difference

Related topics