I wrote a simple extension of the CuArrays.jl tutorial section on how to iterate a value function on the gpu.
Very nice. I wrote a similar tutorial for parallel VFI here, but it doesn’t do GPU.
It doesn’t look like solving the problem on the GPU is much faster than using threads on the CPU, but maybe it’s problem specific. Do you know if interpolation (splines) works on the GPU?
Everything works on the gpu. But you will have to implement most of the stuff yourself. Most packages won’t be available in your kernel function.
In terms of performance, I learned that this needs a lot of caution about the specifics of your hardware. I’m not an expert by any stretch of the imagination so I’m sure one could do better here.
Hi Florian,
Not sure you’re still involved with this after 2 years, but I recently invested some time into running the VFI on GPU. I have one comment and one question.
Regarding the performance, not sure if it’s due to some recent updates in CUDA in Julia (given that it seems CUDAnative and CuArrays are now merged together) or the hardware specification (didn’t see what kind of CPU you were testing the model on), but I managed to get quite a difference between CPU and GPU when running for a large problem (tried with total of 75000 and 175000 assets*income points), a speed up by a factor of 15 approximately.
Now regarding my question. I was curious, why did you choose to index the value function inside the kernel using linear index as
V[jx + nx*(je-1 + ne*(age))]
Given that you anyway extract cartesian indexes, wouldn’t it be easier to use those directly? As I am planning to implement something similar using 4-5D arrays, could get a bit messy…
Thanks, and hope you still remember what I’m talking about
hey! cool! first of all, that was my only experience with GPUs on custom kernels (that worked), so I’m really no expert - whether your example runs faster than mine could be due to better hardware or your better use of it
the linear indexing thing: good question, i have no idea tbh. it used to be the case that on custom arrays, linear indexing was faster. I don’t think that’s true anymore in modern julia and base arrays (they all implement a fast
indexing method, i.e. the linear index) - I might be wrong though. anyways, I don’t actually know whether this is available on the GPU in a custom kernel. so, I wrote the linear index to be sure that this will work in the kernel. notice that this is different from cartesian indexing on a CuArray
in your julia code (where it will work). anyway, would be great to see your example to learn a bit more about this!
I see, thank for the reply!
Once I get the kernel with big arrays work I will let you know, hopefully, a promising project!