Running For loops on GPU



I am trying to do simulation in which

  1. I am repeating same function multiple times
  2. Changing several parameters

Is there a way I can do any of them via a for loop or some other way on GPU?

I am new to Julia. Any help or guidance will be great.


You are asking this question at a very interesting time (for us). Since you are new to Julia this might not be the easiest topic to dive right into and things are still very much under development.

Broadly speaking you have three options if you want to work with GPU programming and Julia currently

  1. OpenCL.jl and CUDArt.jl provide access to the OpenCL and CUDA environment and you can write kernels in CUDA/OpenCL C and execute them on the GPU, but do the management in Julia
  2. ArrayFire.jl provides a high-level interface to the ArrayFire library allowing you to run specific functions on the GPU
  3. Julia 0.6 comes with the necessary support to compile Julia code directly to the GPU. The GPU compiler is implemented in CUDAnative.jl

CUDAnative.jl and GPUArrays.jl

CUDAnative provides a CUDA-like programming environment but it has a certain set of limitations (due to the fact that GPU programming is quite different from programming for the CPU). To try it out you will need to build Julia v0.6 from source and follow the instructions in the CUDAnative.jl readme.

We eventually aim to provide a higher-level interface in the form of GPUArrays.jl, but both CUDAnative and GPUArrays are still experimental (although we welcome early adopters).

I hope this gives you a good enough introduction to what you can currently do.


These seem to provide low level or functional programming. Will it be possible to compile an arbitrary for loop at any point? I think not, due to the GPU programming model and that many loops cannot be parallelized . Is this correct?


That is correct. Although programming model is a bit misleading, because the constraining factor is really the hardware.
In theory you could compile arbitrary code to the GPU, but obviously no one is really motivated to implement the missing bits when they end up running slower than on the CPU.
In GPUArrays, I’m working on abstractions, that make it a bit easier to run the body of a for loop on the GPU :wink:
If you have a NVIDIA GPU and quite a bit of patience, we can try to figure things out!


I like this idea. and I have both :stuck_out_tongue:


Damn it! :smiley:

Well first you need to get CUDAnative running.
Install instructions are in the README .
If that works, feel free to open a documentation issue on GPUArrays, and I will try to respond to that :wink:


I’ve updated the README, nowadays it just involves building julia/master from source and using the package manager. Expect a blogpost with more details very soon (say, next week).


Can you guide me on how to build julia 0.6? If the steps are similar to 0.5. Where can I find the source code. The github version seems to be 0.5

Also will I need to build llvm separately or I can do that by Pkg.add(“llvm”)?


On what platform are you? The build steps should be documented in Julia’s README, even how to get the source.


Thank you. I was doing git checkout 0.5. which converted the version to 0.6.

I read it carefully this time. Thank you very much. CUDAnative installed and tested. Passed.