function fun3(x; n=36)
result = zero(typeof(x))
for i in 1:n
result += cos(i*x)
return result * sin(x)
You can change n to be the number of terms.
Also, the original expression not compiling is probably a bug, which should probably be reported. I am not sure if CUDA.jl is the right package to report this on or not, but I am sure someone else can point you in the right direction.
EDIT: Changed mapreduce to a normal for loop as mapreduce would not compile. Current code works.
Rewriting the sum as a loop will work. But this is just a minimum example. I want to figure out the largest size of expression which CUDA.jl could handle.
Actually, we want to broadcast a expression without any regular patterns which is about 10000 lines. We have tried this on C++ code with OpenAcc and It fails to compile. But, for a 800 lines of expresion OpenAcc works well.
Of course, we can sepreate this large expression into many small pieces with maybe some metaprogamming tricks. If CUDA.jl can do this automatically that will be very nice.
Thanks for your nice code. It works and very efficient. It only takes about 167μs when broadcasting on a 1000*1000 random matrix for n=35. While, it will takes about 4.6ms for the ‘original expression’.
The difference in timing there is almost certainly because the compiled kernel is smaller and many more threads can run in parallel.
The issue you’re seeing both here and on openacc is likely a hardware limit on the maximum size of the kernel that your hardware will accept. And even if they can run, as you see above, they’re not going to take much advantage of the parallel capabilities of cuda hardware.
I would recommended breaking your ‘10000 variable’ kernels down to a sequence of much smaller kernels that are small enough to take advantage of GPU parallelism and simply run them in sequence.
well you shouldn’t expect ANY library to be able to “just” compile arbitrary function vectorized onto GPUs, it’s ~more or less a matter of optimization and heuristics of the compiler and front end that determines how much you can get away with without thinking, but eventually you will have to think