You should follow @vchuravy’s advice:
Doing so, it’s immediately clear that the call to squares_rng is not inferred:
julia> @device_code_warntype interactive=true main()
call #squares_rng(::Int64,::Any)::Union{}
Note the ::Any, you’re using an untyped global variable. Performance Tips · The Julia Language. Furthermore, your first argument idx is of type Int64 while you explicitly define squares_rng to only accept UInt64.
Fixing both then leads to the issue that your sample_extreme_values function is defined ::Float32 while you return an Array{Float32}.
If your code is functional and well-typed, then that’s exactly what CUDA.jl does ![]()