Policy function algorithm becomes much slower when I increase the upper bound of the grid

SGHoekstra · January 8, 2025, 12:04pm

Dear all,

I am facing a performance issue that I find hard to understand so I am hoping I can gain some insights here. I have written code to solve an economic problem using fixed point iteration or also known as policy function iteration. The problem is defined on the Cartesian product of a number of grids.

The issue is the following: If I increase the upper bound of two of the grids, namely kgrid and bgrid, by a factor 2 the speed of the algorithm drops by a factor 2. I keep all else equal including the number of grid points. I tested the programme and I got the following results:

If the upper bound is 100, the time until convergence is 877.157062 seconds.
If the upper bound is 200, the time until convergence is 1537.049566 seconds.

policy_function_iteration.jl (6.3 KB)

I have included the code in this post. I have tried profiling the code but this did not make it clearer for me. Any suggestions to alleviate this problem would be much appreciated!

I am using Julia version 1.10.4,
Interpolations version 0.15.1,
NLsolve 4.5.1.

Thank you in advance,
Steven

nilshg · January 8, 2025, 12:59pm

That’s a lot of code and a very long runtime so I won’t have time to dig into it but

(1) this looks a lot like Matlab, with many non-Julian and potentially performance-reducing idioms (collect all over the place, slicing arrays without taking views, creating lots of arrays even in tight loops) and
(2) as a result when running this with just 100 iterations I get

 29.862649 seconds (698.93 M allocations: 31.719 GiB, 12.22% gc time)

At that level of allocations it’s not really helpful to reason about the performance behaviour of multithreaded code, so my advice is to read the Performance tips and make sure they’re all adhered to, get rid of the threading and work on getting your hot loop allocation free (or at least as close to it as possible, might not be entirely possible with NLsolve) and as fast as possible, then re-introduce threading.

SGHoekstra · January 9, 2025, 2:37pm

Thank you for your suggestions. I have read tip and tried profiling allocations but I am no expert. I tried to preallocate as many arrays as possible and use inline replacement as well. I was able to decrease the amount allocations to a fourth of previous allocations but this is as far I get.

Do you perhaps any other suggestions for decreasing the number of allocations?
pfi_noslice.jl (7.0 KB)

nilshg · January 9, 2025, 3:22pm

This looks a lot better, for me the difference from old to new is:

 88.116497 seconds (1.11 G allocations: 50.229 GiB, 4.38% gc time, 1.33% compilation time: 60% of which was recompilation)

 24.218082 seconds (65.35 M allocations: 3.360 GiB, 1.24% gc time, 4.28% compilation time: 59% of which was recompilation)

(for 100 iterations). A couple of simple changes get me to:

 19.605032 seconds (39.38 M allocations: 1.833 GiB, 0.80% gc time, 3.45% compilation time)

namely:

pass FOC as an argument to function policy_function_iterate_simultaneous
Preallocate the X₀ outside the loop (X₀ = zeros(3)) and update it inside X₀ .= (c₀[i, j, z1, z2, z3], k₀[i, j, z1, z2, z3], b₀[i, j, z1, z2, z3])
Add a views to the slice of PI in the F! function: @views(PI[z,:])

Beyond that there’s nothing that jumps out from a cursory look, but gut feel 40m allocations and 1.8GB is still a lot unless this all comes from the calls to nlsolve and is unavoidable?

So I would try to quite carefully benchmark policy_function_iterate_simultaneous to see how much it allocates and where these allocations come from - there’s likely a better way to do the updating of the FOC but I’d have to think about it more closely.

SGHoekstra · January 10, 2025, 5:03pm

Thank you for the suggestions! I will try to implement after the weekend and let you know if it has helped.

Topic		Replies	Views
Improving performance of dynamic programming problem of households in a labor market model Performance	25	1214	January 26, 2022
Memory allocation within loop Performance question	7	745	June 28, 2020
Speeding up repeated optimizations within a large loop Performance economics , optim , optimization	7	857	June 25, 2020
Allocations and time of running the same program twice are orders of magnitude larger than running them separately New to Julia performance , memory-allocation , benchmark	11	660	June 17, 2024
Memory allocation and performance Performance memory-allocation	7	641	June 26, 2020

Policy function algorithm becomes much slower when I increase the upper bound of the grid

Related topics