it is probably easier to port the CL version that I post instead of @sdanisch one at the cost of performance.
Right now CUArrays is rough to install because it needs Julia to be built from source in order to do its codegen. Thatāll change with Juliaās v0.7/1.0 release though. I still havenāt gotten it to work on v0.6 at all so YMMV.
In theory, you just need to prefix all math intrinsics with CUDAnative.
E.g.: log, sqrt, max.
stuff like @linearidx
& gpu_call
, gpu_rand!
comes from GPUArrays and should work for CuArrays & CLArrays!
So basically anything, that doesnāt come from GPUArrays and is not pure Julia.
Have a look at: https://github.com/JuliaGPU/CUDAnative.jl/blob/master/src/device/libdevice.jl for a more exhaustive list of what functions you need to replace!
Hi,
I want to come back here for the gpu_call
. It is here called with the default configuration = length(A)
. Is it the best way to set up the number of threads? How can this be optimised?
Tahnk you
Bests