Not saying you’re as stupid as I am, but have you checked for NaNs in the full data set? I spent about 3 hours the other day working out what went wrong in an optimisation problem that worked when using some of my data but not all of it and it turned out there was an unexpected Nan in the data (stock prices, so a NaN value really didn’t make any sense at all…)
I guess this isn’t the case as you seem to be able to solve the problem on the full data with some optimizers, but thought I’d mention it anyway.
And to throw something else into the mix, this old discussion about potential ways of catching NaNs as they appear: