I would like to ask you for advice regarding how to parallelize the loss function in Flux. Context: I am solving a discounted infinite-horizon optimization problem (in economics). It has three controls (Cs, Ci, Cr), which I parameterize with deep neural networks (four hidden layers with 32 neurons). To approximate the infinite horizon problem, I am finite-horizon approximation (lets say 150 periods).
The problem is, that I need to evaluate this loss function over a large grid of points (few thousand), and that takes a lot of time. My networks have relatively small layers, so GPU doesn’t help there. Evaluation at one point is a serial thing that can’t be parallelized, but I think, that maybe I can multithread/distribute those operations to multiple cpu cores (6 cores on my laptop) or on the cluster of my university.
So, I would like to ask for some advice about optimal way, how to parallelize this type of loss function on CPUs. Should I try something like Floops? Also, I would like to ask, whether I should try FastChain from DiffEqFlux instead of regular Chain. Does it provides significant speed-up, or are my networks too large for it (around 8700 parameters for each network)?
#(5) Build law of motion @unpack α,ℬ,θ,φ,𝖉,π1,π2,π3,πr,πd,ν = Mod1 function 𝓗(𝛀,Cs,Ci,Ns,Ni) 𝓢 = 𝛀[1,:]' - π1.*Cs.*Ci.*𝛀[1,:]'.*𝛀[2,:]' - π2.*Ns.*Ni.*𝛀[1,:]'.*𝛀[2,:]' - π3.*𝛀[1,:]'.*𝛀[2,:]' 𝓘 = (1-πr-πd).*𝛀[2,:]' + π1.*Cs.*Ci.*𝛀[1,:]'.*𝛀[2,:]' + π2.*Ns.*Ni.*𝛀[1,:]'.*𝛀[2,:]' + π3.*𝛀[1,:]'.*𝛀[2,:]' 𝓡 = 𝛀[3,:]' + πr.*𝛀[2,:]' 𝞨 = [𝓢;𝓘;𝓡] return 𝞨 end #(6) Utility function u(c,n) = log(c) - θ/2*n^2 #(7) Build objective function function 𝓠(x) 𝛀 = x 𝓤 = zeros(Float32,1,size(𝛀)) 𝓮1 = zeros(Float32,1,size(𝛀)) 𝓮2 = zeros(Float32,1,size(𝛀)) 𝓮3 = zeros(Float32,1,size(𝛀)) for i in 1:𝓣 cs_u = Cs(𝛀) ci_u = Ci(𝛀) cr_u = Cr(𝛀) #Non-Negativity cs = max.(cs_u,ν) ci = max.(ci_u,ν) cr = max.(cr_u,ν) 𝓮1 += max.(-cs_u.+ν,0.0000010f0) 𝓮2 += max.(-ci_u.+ν,0.0000010f0) 𝓮3 += max.(-cr_u.+ν,0.0000010f0) #Cumulate reward 𝓤 += ℬ^(i-1)*(𝛀[1,:]'.*u.(cs,cs./α) + 𝛀[2,:]'.*u.(ci,ci./(α*φ)) + 𝛀[3,:]'.*u.(cr,cr./α) + (1 .- 𝛀[1,:]'-𝛀[2,:]'-𝛀[3,:]').*𝖉) 𝛀 = 𝓗(𝛀,cs,ci,cs./α,ci./(α*φ)) end return -sum(𝓤) + sum(𝓮1.^2) + sum(𝓮2.^2) + sum(𝓮3.^2) end
Any advice/guidance would be welcomed!
Of course, I need to parallelize in a way, that would allow for automatic differention w.r.t. network parameters.