Is this a good GPU use case?

philip · January 30, 2023, 6:08pm

I would like to perform millions of parallel discrete optimization computations loosely like the following:

later      = [9, 8, 4, 1, 0]
today      = [1, 2, 3, 4, 5]
best_value = -Inf
best_index = -Inf

for index in 1:5
    value = today[index] + 1/2 * later[index]
    if value > best_value            
        best_index = index
        best_value = value
    end
end

julia> print(best_index)
2

In other words, each parallel task is too look at a collection of possible continuation values from a choice, a collection of immediate values from making the same choice, and then select the pair that maximizes the total value. It should then record the choice, and repeat. (This is part of a backwards recursive optimization of a utility function for an application in economics.)

On the one hand, at a low-level all of the computations are just math functions, ±*/ etc. and some exponentiation, so it seems this might be GPU-able.

On the other hand:

It’s not clear to me whether GPUs can handle this type of logic even in the example I gave
The actual problem involves more complicated data structures that are not fixed-size arrays; e.g., the map from the today choice into the continuation value is not just a matched index, but requires some computation to figure out which entry in the collection of continuation values should match a given today choice
The actual problem involves custom types that are somewhat involved.

I have no prior GPU programming experience and am wondering if it is possible that a GPU would be able to tackle a problem like this and if so how one would setup an analogous MWE on a GPU to start adapting the code for it.

Grateful for any thoughts; thank you!

tArmangue · January 28, 2024, 2:33pm

I’ve done a (fairly simple) value function iteration problem using GPUs in Julia. The way I did it (I’m not very experienced either) was to have parallelisation at the state level, with each thread handling the maximisation problem in the kernel, so it sounds like what you are after. RE the types, in my experience the GPU kernels require a lot of things to be changed, so you might have to just re-write large portions of the problem setup. Let me know if this is the kind of thing that you are after and I’ll post a MWE.

pitsianis · January 28, 2024, 8:08pm

This is a use case that the GPU can handle.

But try first to run in parallel on the CPU. Start julia with -t auto and use the package ThreadsX.jl
Then simply run something like

ThreadsX.mapreduce((x, y) -> x + 0.5*y, max, today, later)

If the above is not fast enough, then try on the GPU by simply copying the arrays as
GPU arrays and execute mapreduce without the prefix ThreadsX.

Topic		Replies	Views
Value Function Iteration on GPU GPU first-steps	6	1984	March 9, 2019
From CPU to GPU and back - compatible code for both New to Julia	3	774	December 29, 2018
[blog post] Introduction to GPU programming Community gpu , cudanative , gpuarrays , blog-post	15	3405	December 20, 2018
Parallelizaton on GPU slower than on CPU...? Performance gpu	10	2359	January 21, 2020
Value function backward induction Finance and Economics	23	2532	August 22, 2020

Is this a good GPU use case?

Related topics