Value Function Iteration on GPU

Hi,
I am trying to understand how to do the very basic Value Function Iteration in an Overlapping Generations Model using the GPU. So far it works well on the CPU, but I am starting to run into trouble once I increase the number of grid points and make my model more complicated. Here is a simple example below that I would like to re-do to run on GPU. I am using JuliaPro 0.6.4.1 on Windows.
I tried looking at the CUDAnative, GPUarrays, CLArrays (as well as in GitHub - giob1994/GPU-OpenCL-VFI-in-Julia: GPU + OpenCL Value Function Iteration for Macro Models) but I don’t understand where to start exactly and how to translate all the functions and arrays I have so they can run on GPU.
First, I create my module with all the functions and types (apologies it’s quite large):

module mymodule

using QuantEcon: tauchen

export CreateAllParams, create_nextVV, gs_age_J_r, gs_age_j_r, income, Tax

mutable struct SParams

    # demographics
    J :: Int64
    Jret  :: Int64

    # preferences
    beta :: Float64
    sigma  :: Float64
    phi  :: Float64
    gamma  :: Float64
    vartheta  :: Float64
    flatbar  :: Float64
    psi  :: Float64
    omega  :: Float64
    psiu  :: Float64

    # endowments
    rhoz  :: Float64
    sigmaz  :: Float64
    sigmaz0  :: Float64
    xij :: Vector{Float64}
    Theta  :: Float64

    # housing
    deltah  :: Float64
    kappah  :: Float64
    alpha  :: Float64
    psir  :: Float64
    Lbar  :: Float64
    deltadh  :: Float64

    # financial instruments
    qb  :: Float64
    rb  :: Float64
    iota  :: Float64
    rm  :: Float64
    lambdab  :: Float64

    # government
    tau0y  :: Float64
    tau1y  :: Float64
    mbar  :: Float64
    rhoss  :: Float64
    tauh  :: Float64


    # credit conditions
    kappam  :: Float64
    zetam  :: Float64
    lambdam  :: Float64
    lambdapi  :: Float64
    qmort  :: Float64

end


mutable struct SGrids

    Ht_grid :: Vector{Float64}

    n_ht :: Int64

    Pepsilon :: Matrix{Float64}
    epsilon :: Vector{Float64}

    pension_grid :: Vector{Float64}

    B_grid :: Vector{Float64}

    income_grid :: Matrix{Float64}

end

mutable struct SArrays

    vvv_opt_r :: Array{Float64}
    policy_opt_r :: Array{Float64}

    g_opt_r ::   Array{Int64}

end

mutable struct SSim

    population :: Float64
    Weight :: Array{Float64}
    A :: Array{Int64}
    Y :: Array{Int64}
    C :: Array{Float64}
    Htilde :: Array{Int64}

    Aprime :: Array{Int64}
    Htildeprime :: Array{Int64}
    Adead :: Array{Int64}

    G_r :: Array{Bool}
    Inc :: Array{Float64}

end

mutable struct SAll

    p :: SParams
    g :: SGrids
    a :: SArrays
    s :: SSim

end

function FillParams(rb = 0.03, zetam = 0.15, price=1.0, J = 60, Jret=44, beta=0.964, sigma=2.0, phi=0.16, gamma=0.8, vartheta=0.0, flatbar=7.7, psi=5.0,omega=1.015,psiu=5.0,rhoz=0.75,sigmaz=0.08,sigmaz0=0.04,xij=zeros(44),Theta=2.0,deltah=0.015,kappah=0.07,alpha=0.6,psir=0.008,Lbar=0.0,deltadh=0.22,qb=0.0,iota=0.33,rm=0.0,lambdab=0.2,tau0y=0.75,tau1y=0.151,mbar=20.0,rhoss=0.6,tauh=0.01,kappam=0.04,lambdam=0.95,lambdapi=0.4,qmort=0.0)

    p = SParams(J, Jret, beta, sigma, phi, gamma, vartheta, flatbar, psi,omega,psiu,rhoz,sigmaz,sigmaz0,xij,Theta,deltah,kappah,alpha,psir,Lbar,deltadh,qb,rb,iota,rm,lambdab,tau0y,tau1y,mbar,rhoss,tauh,kappam,zetam,lambdam,lambdapi,qmort)
    p.vartheta = 1/p.sigma
    p.qb = 1/(p.rb+1)
    p.rm = p.rb*(1+p.iota)
    for jjj = 1:p.Jret
        p.xij[jjj] = 0.4078515 + 0.0668822*jjj - 0.0012752*(jjj)^2;
        #p.xij[jjj] = 0.6078515 + 0.0668822*jjj - 0.0012752*(jjj)^2;
    end
    p.qmort = 1-p.zetam
    p.Lbar = 0.05*p.Theta/((p.Theta/(p.alpha*price))^(1/(1-p.alpha)))

    return p

end

function income(j, idshock, p :: SParams)
    out = p.Theta*p.xij[j]*exp.(idshock);
    return out
end


function pension(idshock, p :: SParams)
    out = p.rhoss*income(p.Jret,idshock,p)
    return out
end

function utility(c,s, p :: SParams)
    out = (((1-p.phi)*c^(1-p.gamma)+p.phi*s^(1-p.gamma))^((1-p.vartheta)/(1-p.gamma))-1)/(1-p.vartheta);
    return out
end

function bequest(flat, p :: SParams)
    out = p.psi*((flat+p.flatbar)^(1-p.vartheta)-1)/(1-p.vartheta);
    return out
end

function Tax(y,m, p :: SParams)
    mded = min(m,p.mbar);
    T = p.tau0y*(y-p.rm*mded)^(1-p.tau1y);
    return T
end

function value_age_J_r(bprime, htilde, b, y, current_rent, p :: SParams)
    c = b + y - Tax(y,0,p)-p.qb*bprime-current_rent*htilde;
    flat = bprime;
    s = htilde;
    if c <= 0
        return -Inf
    elseif bprime < 0
        return -Inf
    else
        vv = utility(c,s,p) + p.beta*bequest(flat,p);
        return vv
    end
end

function value_r(j, bprime, htilde, b, y, current_rent, nextVV, p :: SParams)
    c = b + y - Tax(y,0,p) - p.qb*bprime - current_rent*htilde;
    s = htilde;
    if c <= 0
        return -Inf
    elseif bprime < 0
        return -Inf
    else
        vv = utility(c,s,p) + p.beta*nextVV;
        return vv
    end
end


function create_income_grid(n_z, epsilon, pension_grid, p :: SParams)

    income_grid = zeros(Float64,n_z,p.J)
    for j = 1:p.Jret
        for i_z = 1:n_z
            income_grid[i_z,j] = income(j,epsilon[i_z],p)
        end
    end
    for j = p.Jret+1:p.J
        for i_z = 1:n_z
            income_grid[i_z,j] = pension_grid[i_z]
        end
    end

    return income_grid

end

function create_nextVV(j, i_z, n_b, Pepsilon, vvv_opt_r, p :: SParams)

    VVVn = zeros(n_b)

    if j >= p.Jret
        VVVn = vvv_opt_r[i_z,:,j+1]
    elseif j == p.Jret
        VVVn = vvv_opt_r[i_z,:,j+1]
    else
        VVVn = (Pepsilon[i_z,:]'*vvv_opt_r[:,:,j+1])'
    end
    return VVVn
end

function construct_grids(n_z :: Int64, n_b :: Int64, p :: SParams)


    Ht_grid = [1.17; 1.50; 1.92]

    n_ht = length(Ht_grid)

    mc = tauchen(n_z, p.rhoz, p.sigmaz, 0, 2)
    Pepsilon = mc.p
    epsilon = mc.state_values

    pension_grid = zeros(n_z)

    B_grid = linspace(0,10,n_b)

    for i_z = 1:n_z
        pension_grid[i_z] = pension(epsilon[i_z],p);
    end

    income_grid = create_income_grid(n_z, epsilon, pension_grid, p)

    grids = SGrids(Ht_grid, n_ht, Pepsilon, epsilon, pension_grid, B_grid, income_grid)

    return grids

end

function create_empty_arrays(n_z :: Int64, n_b :: Int64, p :: SParams)

    vvv_opt_r    =          Array{Float64}(n_z,n_b,p.J)
    policy_opt_r =          Array{Float64}(3,n_z,n_b,p.J)

    g_opt_r =               Array{Int64}(n_z,n_b,p.J)

    arrays = SArrays(vvv_opt_r, policy_opt_r, g_opt_r)
    return arrays

end


function create_sim(N,J)

    population              = N*J
    Weight                  = 1/population*ones(N,J)

    A                       = zeros(Int64, N,J)
    Y                       = zeros(Int64, N,J)
    C                       = zeros( N,J)

    Htilde                  = zeros(Int64, N,J)

    Aprime                  = zeros(Int64, N,J)
    Htildeprime             = zeros(Int64, N,J)
    Adead                   = zeros(Int64, N,J)

    G_r                     = zeros(Bool, N,J)
    Inc                     = zeros(N,J)

    sims = SSim(population,Weight,A,Y,C,Htilde,Aprime,Htildeprime,Adead,G_r,Inc)

    return sims

end

function gs_age_J_r(b :: Float64, y :: Float64, B_grid, n_b, Ht_grid, n_ht, current_rent, p :: SParams)

    vvv_r = zeros(n_ht, n_b)
    policy_r = zeros(3)

    for i_ht = 1:n_ht
        for i_b = 1:n_b
            vvv_r[i_ht, i_b] = value_age_J_r(B_grid[i_b],Ht_grid[i_ht],b,y,current_rent,p)
        end
    end
    (vvv_opt_r, idx) = findmax(vvv_r)
    (idxh, idxb) = ind2sub(vvv_r, idx)
    policy_r[1] = idxb
    policy_r[2] = idxh
    policy_r[3] = b + y - Tax(y,0,p) - p.qb*B_grid[idxb]-current_rent*Ht_grid[idxh]

    return vvv_opt_r, policy_r


end

function gs_age_j_r(j :: Int64, b :: Float64, y :: Float64, B_grid, n_b, Ht_grid, n_ht, current_rent, VVVn :: Array{Float64,1}, p :: SParams)

    vvv_r = zeros(n_ht, n_b)
    policy_r = zeros(3)
    for i_ht = 1:n_ht
        for i_b = 1:n_b
            vvv_r[i_ht, i_b] = value_r(j, B_grid[i_b],Ht_grid[i_ht],b,y,current_rent,VVVn[i_b],p)
        end
    end
    (vvv_opt_r, idx) = findmax(vvv_r)
    (idxh, idxb) = ind2sub(vvv_r, idx)
    policy_r[1] = idxb
    policy_r[2] = idxh
    policy_r[3] = b + y - Tax(y,0,p) - p.qb*B_grid[idxb]-current_rent*Ht_grid[idxh]

    return vvv_opt_r, policy_r

end

function CreateAllParams(n_z, n_b, N)

    params = FillParams()
    grids = construct_grids(n_z, n_b, params)
    arrays = create_empty_arrays(n_z, n_b, params)
    sims = create_sim(N, params.J)

    alloutput = SAll(params,grids,arrays,sims)

    return alloutput

end


end

Then I run the main file

using mymodule

n_b = 80
n_z = 3
n_ht = 3
N = 100

B = CreateAllParams(n_z, n_b, N)

j = 60
for i_z in 1:n_z, i_b in 1:n_b
    B.a.vvv_opt_r[i_z,i_b,j], B.a.policy_opt_r[:,i_z,i_b,j] = gs_age_J_r(B.g.B_grid[i_b],B.g.pension_grid[i_z], B.g.B_grid, n_b, B.g.Ht_grid, B.g.n_ht, 0.1, B.p)
end


for j =59:-1:1
    for i_z in 1:n_z, i_b in 1:n_b
        VVVn = create_nextVV(j, i_z, n_b, B.g.Pepsilon, B.a.vvv_opt_r, B.p)
        B.a.vvv_opt_r[i_z,i_b,j], B.a.policy_opt_r[:,i_z,i_b,j] = gs_age_j_r(j, B.g.B_grid[i_b], B.g.income_grid[i_z,j], B.g.B_grid, n_b, B.g.Ht_grid, n_ht, 0.1, VVVn, B.p)
    end
end

Essentially, for each value of j I would like to somehow send the loops for i_z in 1:n_z, i_b in 1:n_b to gpu for parallel computation (as the operation on each grid point is quite simple), collect the results and move to next j.

Thanks for help!

1 Like

If you want to use any of the array-based abstractions (CLArrays, CuArrays) you’ll need to rethink your program in terms of vectorized function calls, eg. broadcasting the gs_age_j_r over your input arrays. If the structure of your loop does not allow that, you can look at CUDAnative to write kernels where you use/convert the thread indexing intrinsics (threadidx, blockdim, blockidx, griddim) to appropriate indices for your algorithm. Either way, you’ll need to get rod of the loops and express them in terms of the model of parallelism you choose.

1 Like

Thanks for the reply @maleadt. Let me see if I understand correctly. So function gs_age_j_r, for example, accepts as argumets individual points from the grid (for example b and y are taken from the grid). You’re saying that I need to find a way to calculate that function where I supply all the grid at once (in this case B_grid and income_grid instead of b and y)? Is that correct?

And do you happen to have some simple example on how to write kernels in CUDAnative, except for the one in the documentation?

Thanks!

Yes. The broadcasting machinery will apply the underlying function to scalar elements just as it happens now, but with the parallelism encoded in the vectorized application. CuArrays can work with this, and launch kernels appropriately.

Look for books on GPU/CUDA programming in general, CUDAnative isn’t that much different in terms of abstraction. Working at this level is much more bothersome though, but gives you flexibility to express parallelism outside of the boundaries set by map/broadcast.

hey @myroslav not sure if you ever continued on this, but have a look at that: https://github.com/ealdrich/VFI

the kernel is written for you :slight_smile:

let me know if you do more on this, I wanted to spend some time on this as well. but then… :wink:

1 Like

I wrote this up in the end as a tutorial.

https://floswald.github.io/html/vfi.html

2 Likes

This is really great, thanks a lot!