This question is about how I should go about investigating why exactly my code is slow.
The scenario is that I have a large-ish (10GB) Vector{Any}
that I want to interate over, do some calculations, and insert into another Vector{Any}
which is then returned. I don’t know from the beginning the size of the output vector; it’s smaller than the input, but still large-ish, think of the order of 25% the input.
I run my current code with ProgressBars
and I see that at the start I’m doing about 200k iterations per second, but towards the end (15 minutes) I’m doing roughly 100k it/s. The first version of the code has the obvious problem that when inserting into a huge Vector{Any}
some times you will trigger enormous allocations and copies. Indeed, these are so large that I can see the ProgressBar
hang for a couple of seconds, before continuing, and these hangups get longer and longer as time progresses. So the symptoms I’m seeing are consisten with my mental picture of allocating and copying vectors.
So what I’ve tried is the following. While iterating the input and inserting, I instead insert into a Deque{Any}
, and at the end of the iteration I instanciate an undefined vector by using out = Vector{Any}(undef, len)
and iterate the deque and copy the elements one by one by using
len = length(out_deque)
out = Vector{Any}(undef, len)
for (n,i) in enumerate(out_deque)
out[n] = i
end
According to the DataStructures
docs, deques are implemented as unrolled lists. Therefore, my naive expectation is that I should still get allocations, but copies should no longer be happening (or rather, copies longer than the size of each elemental vector should never happen), and so the I shouldn’t see that the number of iterations per second goes down.
Surprisingly, I still see the exact same symptoms. Iterations per second go down over time, I still see the same periodic hangups getting longer and longer, and I still start with 200k it/s and go down to 100k it/s.
Question: How can I investigate what’s going on?