Is this a good GPU use case?

I would like to perform millions of parallel discrete optimization computations loosely like the following:

later      = [9, 8, 4, 1, 0]
today      = [1, 2, 3, 4, 5]
best_value = -Inf
best_index = -Inf

for index in 1:5
    value = today[index] + 1/2 * later[index]
    if value > best_value            
        best_index = index
        best_value = value
    end
end

julia> print(best_index)
2

In other words, each parallel task is too look at a collection of possible continuation values from a choice, a collection of immediate values from making the same choice, and then select the pair that maximizes the total value. It should then record the choice, and repeat. (This is part of a backwards recursive optimization of a utility function for an application in economics.)

On the one hand, at a low-level all of the computations are just math functions, ±*/ etc. and some exponentiation, so it seems this might be GPU-able.

On the other hand:

  • It’s not clear to me whether GPUs can handle this type of logic even in the example I gave
  • The actual problem involves more complicated data structures that are not fixed-size arrays; e.g., the map from the today choice into the continuation value is not just a matched index, but requires some computation to figure out which entry in the collection of continuation values should match a given today choice
  • The actual problem involves custom types that are somewhat involved.

I have no prior GPU programming experience and am wondering if it is possible that a GPU would be able to tackle a problem like this and if so how one would setup an analogous MWE on a GPU to start adapting the code for it.

Grateful for any thoughts; thank you!

1 Like

I’ve done a (fairly simple) value function iteration problem using GPUs in Julia. The way I did it (I’m not very experienced either) was to have parallelisation at the state level, with each thread handling the maximisation problem in the kernel, so it sounds like what you are after. RE the types, in my experience the GPU kernels require a lot of things to be changed, so you might have to just re-write large portions of the problem setup. Let me know if this is the kind of thing that you are after and I’ll post a MWE.

This is a use case that the GPU can handle.

But try first to run in parallel on the CPU. Start julia with -t auto and use the package ThreadsX.jl
Then simply run something like

ThreadsX.mapreduce((x, y) -> x + 0.5*y, max, today, later)

If the above is not fast enough, then try on the GPU by simply copying the arrays as
GPU arrays and execute mapreduce without the prefix ThreadsX.

1 Like