In case anyone finds this at some point, I’m quite convinced the issue is data locality. The performance hit occurs when using push! to the inner arrays, which is what I was using to fill the inner arrays… When pushing, there is of course no guarantee the arrays can be located close to each other.
Making a deepcopy presumably finds some spot in memory where all the inner arrays can be adjacent or close, facilitating better performance than when there is random access to one of the inner arrays.