I just built a new PC and, after asking some questions here about performance and getting some interesting answers, I thought it would be fun to start a thread where people can show off their builds and, more importantly (relevantly), how Julia is performing on their hardware. I will (shamelessly) start by flaunting my new build!
Hereās my parts list:

CPU: AMD Ryzen 9 3950X 3.5 GHz 16Core Processor

CPU Cooler: ARCTIC Liquid Freezer II 120 56.3 CFM Liquid CPU Cooler

Motherboard: Asus ROG Strix X570I Gaming Mini ITX AM4 Motherboard

Memory: Corsair Vengeance LPX 32 GB (2 x 16 GB) DDR43600 Memory

Storage: Sabrent Rocket 4.0 1 TB M.22280 NVME Solid State Drive

GPU: EVGA GeForce RTX 2080 SUPER 8 GB BLACK GAMING Video Card

Case + Power Supply: InWin A1 Plus Mini ITX Tower Case w/ 650 W Power Supply
Itās a smallformfactor PC (mini ITX) so it sits nicely atop my desk without being imposing. The post wouldnāt be complete without pics, of course (yes, Iām a total Julia fanboy and I put that decal on my brand new case and Iām not ashamed one bit ):
I struggled a bit to find some āstandardā code for benchmarking but I decided on the following (from the CUDA.jl docs, I just changed the length of the arrays to make them longer):
using BenchmarkTools
using CuArrays
N = 2^20
x = fill(1.0f0, N) # a vector filled with 1.0 (Float32)
y = fill(2.0f0, N) # a vector filled with 2.0
y .+= x
function sequential_add!(y, x)
for i in eachindex(y, x)
@inbounds y[i] += x[i]
end
return nothing
end
fill!(y, 2)
function parallel_add!(y, x)
Threads.@threads for i in eachindex(y, x)
@inbounds y[i] += x[i]
end
return nothing
end
# Parallelizaton on the GPU
x_d = CuArrays.fill(1.0f0, N) # a vector stored on the GPU filled with 1.0 (Float32)
y_d = CuArrays.fill(2.0f0, N) # a vector stored on the GPU filled with 2.0
y_d .+= x_d
function add_broadcast!(y, x)
CuArrays.@sync y .+= x
return
end
With N = 2^20
, I get the following results:
julia> @btime sequential_add!($y, $x)
72.500 Ī¼s (0 allocations: 0 bytes)
julia> @btime parallel_add!($y, $x)
37.599 Ī¼s (114 allocations: 13.67 KiB)
julia> @btime add_broadcast!($y_d, $x_d)
70.401 Ī¼s (56 allocations: 2.22 KiB)
With N = 2^27
, it looks like this:
julia> @btime sequential_add!($y, $x)
60.721 ms (0 allocations: 0 bytes)
julia> @btime parallel_add!($y, $x)
57.512 ms (114 allocations: 13.67 KiB)
julia> @btime add_broadcast!($y_d, $x_d)
3.754 ms (56 allocations: 2.22 KiB)
Go ahead, Julians, discard your modesty and show us what Julia can do on your PC!!!