VFI algorithm (Econ) on the GPU tutorial

floswald · March 9, 2019, 3:48pm

I wrote a simple extension of the CuArrays.jl tutorial section on how to iterate a value function on the gpu.

https://floswald.github.io/html/vfi.html

aaowens · March 9, 2019, 4:17pm

Very nice. I wrote a similar tutorial for parallel VFI here, but it doesn’t do GPU.

It doesn’t look like solving the problem on the GPU is much faster than using threads on the CPU, but maybe it’s problem specific. Do you know if interpolation (splines) works on the GPU?

floswald · March 16, 2019, 7:19am

Everything works on the gpu. But you will have to implement most of the stuff yourself. Most packages won’t be available in your kernel function.

In terms of performance, I learned that this needs a lot of caution about the specifics of your hardware. I’m not an expert by any stretch of the imagination so I’m sure one could do better here.

myroslav · February 1, 2021, 11:07am

Hi Florian,

Not sure you’re still involved with this after 2 years, but I recently invested some time into running the VFI on GPU. I have one comment and one question.

Regarding the performance, not sure if it’s due to some recent updates in CUDA in Julia (given that it seems CUDAnative and CuArrays are now merged together) or the hardware specification (didn’t see what kind of CPU you were testing the model on), but I managed to get quite a difference between CPU and GPU when running for a large problem (tried with total of 75000 and 175000 assets*income points), a speed up by a factor of 15 approximately.

Now regarding my question. I was curious, why did you choose to index the value function inside the kernel using linear index as

V[jx + nx*(je-1 + ne*(age))]

Given that you anyway extract cartesian indexes, wouldn’t it be easier to use those directly? As I am planning to implement something similar using 4-5D arrays, could get a bit messy…

Thanks, and hope you still remember what I’m talking about

floswald · February 1, 2021, 11:21am

hey! cool! first of all, that was my only experience with GPUs on custom kernels (that worked), so I’m really no expert - whether your example runs faster than mine could be due to better hardware or your better use of it

the linear indexing thing: good question, i have no idea tbh. it used to be the case that on custom arrays, linear indexing was faster. I don’t think that’s true anymore in modern julia and base arrays (they all implement a fast indexing method, i.e. the linear index) - I might be wrong though. anyways, I don’t actually know whether this is available on the GPU in a custom kernel. so, I wrote the linear index to be sure that this will work in the kernel. notice that this is different from cartesian indexing on a CuArray in your julia code (where it will work). anyway, would be great to see your example to learn a bit more about this!

myroslav · February 1, 2021, 4:18pm

I see, thank for the reply!

Once I get the kernel with big arrays work I will let you know, hopefully, a promising project!

Topic		Replies	Views
Value Function Iteration on GPU GPU first-steps	6	1943	March 9, 2019
Why is my GPU kernel an order of magnitude slower than my CPU function? GPU question	8	232	June 4, 2025
[blog post] Introduction to GPU programming Community gpu , cudanative , gpuarrays , blog-post	15	3324	December 20, 2018
GPU: Scalar indexing in kernel programming GPU cuda	2	258	June 5, 2023
Rewriting function on CPU for execution on GPU GPU	4	906	November 29, 2019

VFI algorithm (Econ) on the GPU tutorial

Related topics