Parallelizing Loss Function

Honza9723 · March 12, 2021, 5:58pm

Dear All,

I would like to ask you for advice regarding how to parallelize the loss function in Flux. Context: I am solving a discounted infinite-horizon optimization problem (in economics). It has three controls (Cs, Ci, Cr), which I parameterize with deep neural networks (four hidden layers with 32 neurons). To approximate the infinite horizon problem, I am finite-horizon approximation (lets say 150 periods).

The problem is, that I need to evaluate this loss function over a large grid of points (few thousand), and that takes a lot of time. My networks have relatively small layers, so GPU doesn’t help there. Evaluation at one point is a serial thing that can’t be parallelized, but I think, that maybe I can multithread/distribute those operations to multiple cpu cores (6 cores on my laptop) or on the cluster of my university.

So, I would like to ask for some advice about optimal way, how to parallelize this type of loss function on CPUs. Should I try something like Floops? Also, I would like to ask, whether I should try FastChain from DiffEqFlux instead of regular Chain. Does it provides significant speed-up, or are my networks too large for it (around 8700 parameters for each network)?

Example code.

#(5) Build law of motion
@unpack α,ℬ,θ,φ,𝖉,π1,π2,π3,πr,πd,ν = Mod1
function 𝓗(𝛀,Cs,Ci,Ns,Ni)
    𝓢 = 𝛀[1,:]' - π1.*Cs.*Ci.*𝛀[1,:]'.*𝛀[2,:]' - π2.*Ns.*Ni.*𝛀[1,:]'.*𝛀[2,:]' - π3.*𝛀[1,:]'.*𝛀[2,:]'
    𝓘 = (1-πr-πd).*𝛀[2,:]' + π1.*Cs.*Ci.*𝛀[1,:]'.*𝛀[2,:]' + π2.*Ns.*Ni.*𝛀[1,:]'.*𝛀[2,:]' + π3.*𝛀[1,:]'.*𝛀[2,:]'
    𝓡 = 𝛀[3,:]' + πr.*𝛀[2,:]'
    𝞨 = [𝓢;𝓘;𝓡]
    return 𝞨
end

#(6) Utility function
u(c,n) = log(c) - θ/2*n^2

#(7) Build objective function
function 𝓠(x)
    𝛀 = x
    𝓤 = zeros(Float32,1,size(𝛀)[2])
    𝓮1 = zeros(Float32,1,size(𝛀)[2])
    𝓮2 = zeros(Float32,1,size(𝛀)[2])
    𝓮3 = zeros(Float32,1,size(𝛀)[2])
    for i in 1:𝓣
        cs_u = Cs(𝛀)
        ci_u = Ci(𝛀)
        cr_u = Cr(𝛀)
        #Non-Negativity
        cs = max.(cs_u,ν)
        ci = max.(ci_u,ν)
        cr = max.(cr_u,ν)
        𝓮1 += max.(-cs_u.+ν,0.0000010f0)
        𝓮2 += max.(-ci_u.+ν,0.0000010f0)
        𝓮3 += max.(-cr_u.+ν,0.0000010f0)
        #Cumulate reward
        𝓤 += ℬ^(i-1)*(𝛀[1,:]'.*u.(cs,cs./α) + 𝛀[2,:]'.*u.(ci,ci./(α*φ))
        + 𝛀[3,:]'.*u.(cr,cr./α) + (1 .- 𝛀[1,:]'-𝛀[2,:]'-𝛀[3,:]').*𝖉)
        𝛀 = 𝓗(𝛀,cs,ci,cs./α,ci./(α*φ))
    end
    return -sum(𝓤) + sum(𝓮1.^2) + sum(𝓮2.^2) + sum(𝓮3.^2)
end

Any advice/guidance would be welcomed!

Best,
Honza

EDIT:

Of course, I need to parallelize in a way, that would allow for automatic differention w.r.t. network parameters.

tkf · March 12, 2021, 9:12pm

Supporting AD in FLoops (both sequential and parallel) or any Transducers.jl-related packages should be straightforward (I think), as long as user-defined loop bodies and functions are also AD-able and the accumulator is type stable (the latter assumption can be removed with some effort). It’s been in (a long list of) want-to-do list of mine but I haven’t had the time to try it out. Nothing is there yet ATM.

Just FYI, how effective parallelization can be depend on the ratio of the serial and parallelizable things. (ref: Amdahl’s law)

Honza9723 · March 12, 2021, 10:09pm

@tkf Thank you very much! I will try Floops! My objective function is just a sum/loop over pretty simple functions of neural network, so I think there shouldn’t be problems in this direction.

So, should I simple write an inner function, that would perform the sequential loop (one grid point evaluation), and then Floop this function over the array of grid points?

Best,
Honza

tkf · March 12, 2021, 10:14pm

Wait no, please don’t try . There is no AD support in FLoops. Sorry if my explanation was unclear.

(I was meant to say I can make it work if I tweak FLoops.jl. But users of FLoops.jl can’t.)

Honza9723 · March 15, 2021, 2:40pm

@tkf Thank you! Is there some other tool, which would allow for differentiation through multithreading/distributed code, or do I had to code it by hard?

Topic		Replies	Views
Optimizing a CUDA-based Loss Function for Neural Network Training GPU flux , machine-learning	0	274	February 18, 2024
Partial Autodiff in NN using Flux New to Julia question , flux , machine-learning , neural-network	1	434	April 25, 2024
Data-parallel training with conv nets in Julia Machine Learning distributed	4	998	July 20, 2018
DiffEqFlux and Threaded Ensemble Problems with ForwardDiff Performance multithreading , distributed , forwarddiff	3	384	May 28, 2023
Flux Loss is not converging when trying to approximate a multivariate function General Usage flux	0	268	March 16, 2021

Parallelizing Loss Function

Related topics