Strange nan errors when mutating immutable struct arrays

I’m working on a large economic model that involves a dynamic optimization problem. I’ve set up arrays within immutable structs to hold the solutions to the problem. The algorithm involves iterating on fixed points on these arrays. So I have “old” and “new” arrays and I compute the tolerance between them and then replace the “old” with the “new”.

I’m running into a problem where I will get small numbers of nan elements in the new array and some (incorrect) massive numbers as well when I mutate them. I used @infiltrate to diagnose the source of these. But when I do the assignment to a new test array, the nan elements don’t show up. In addition, if I do the assignment again to the struct within the infiltrator.jl repl, the nan elements will sometimes disappear.

The code leading up to the assignments is quite complicated, so I’ve just provided some screenshots below to show what’s going on. The cPol, aPol, etc., variables are views into other arrays within the hh struct. I checked each of the components on the right-hand side for nan elements and they had none.

I’d greatly appreciate any insight into this strange error.

Assignment to a new array, test works fine:

Assignment to the struct array shows nan errors:

Assigning again to the struct array and nan errors disappear:

I don’t think it’s possible to help here without a test case (certainly not for me). And if the test case is large, as you said, it might still be tricky to help. But if it is indeed as you describe, then this is a bug in Julia. So it would be good if you can reduce the test case as much as possible and post again here with code to run.

Can you post the lines of code you’re showing as text instead of a screenshot? Would make it easier to debug. Also, having a small example that shows the problem would be helpful for debugging (i.e. a small reproducible example, see here).

Are you perhaps running with --fast-math (which can change floating point results depending on the exact code that’s run - such as using a broadcasted assignment loop vs. not using one)?

Thanks for the quick responses. I’m not using --fast-math…I haven’t included it anywhere in my code and I don’t think any of the packages I’m using would be using it either.

I’ll do some testing tonight and see if I can provide a minimally reproducible example.

1 Like

Looks like I’ve solved the problem.

The line of code that was leading to the error was:

@views hh.WPnew[:,:,iz,iβ,iTE,iw̄] .= hh.cPol_e[:,:,iz,iβ,iTE,iw̄].^(1-σ)./(1-σ) .- ρ[iz] .* γ_s.* hh.sPol_e[:,:,iz,iβ,iTE,iw̄] .^(1 + φ_s) ./ (1 + φ_s) +
 βvals[iβ] .* expectation_e(βtrans[iβ,:], ztrans[iz,:], itpW, itpU, hh.dPol_e[:,:,iz,iβ,iTE,iw̄], hh.aPol_e[:,:,iz,iβ,iTE,iw̄], w̄P, EIbenP, zvals, βvals, 1 .- ρ[iz] .+ ρ[iz]*hh.sPol_e[:,:,iz,iβ,iTE,iw̄].*fh[iz], ρ[iz]*(1 .- hh.sPol_e[:,:,iz,iβ,iTE,iw̄].*fh[iz]))

In the expectation_e function, I had used the similar() command. After poking around a bit on here, I learned that similar() can sometimes produce nan elements. These then just got carried through the rest of the operations in that line of code.

It was just luck that when I called the function again, similar() didn’t spit out any nan elements.

Sorry for taking up everyone’s time with this.

1 Like