Hi all, I felt that part of my code ran slowly, and then noticed runtimes of push! and insert! on the order of µs. In contrast, other in-place methods such as deleteat! or setindex! are on the order of ns.
Are these runtimes expected? They seem slow to me. I’m on mac, julia 1.11.1.
Isn’t it just because they both allocate, whereas the other two don’t? In your benchmark, setting evals=1 essentially guarantees an allocation (and copy). If you don’t set evals=1 you get indeed much faster times.
When you make a Vector, it allocates a chunk of memory to hold its items. There is often some buffer length allocated (more space than the current length of the buffer), but not too much so it doesn’t take too much memory.
When you lengthen the Vector, if you exceed the length of the underlying allocation, Julia needs to request more space from the operating system, which is slow. But even if it doesn’t, it still needs to check the length and the extra branch makes it slower.
In-place operations don’t have this overhead.
Edit: Also, because of the numerous branches in a push! it can’t be vectorized in the CPU. So multiple push! operations are slower than setindex! for that reason too.
the first call has to allocate because the underlying memory is the same size as the array, so you have to get a new place to store the data (and copy the array over to the new place)
My guess is that for large arrays it allocates an exact size so any size extending operation needs to allocate. If you know you are going to extend the size you should allocate a good size from the start.
As I understand it, the push! call makes the array to extend, and the first time you do that it would need to allocate (and copy all old values to the new location). Julia is smart enough to ask more memory that you need so to avoid doing all this copy stuff too often (the total overhead is amortized linear time). But this is if you push! repeatedly on the same vector, not just the first push! (which is indeed relatively more costly).
The code I’m implementing leads effectively to a symmetric random walk of the array length. It is a balanced insert! by one and deleteat! by one, leading to the same array length of 60_000 on average. But individual simulations of course hit the bounds regularly, and increase the array to a length greater than 60000.
Maybe this repetitive “hitting” of the array length bound is my problem… I will try with a sizehint!.
Just sizehint! to 60000 + T where T is the number of steps and you’ll be deterministically safe. If T is small enough this will work perfectly, if T is much larger than 60000 then you can be less conservative and use something like the standard deviation of the random walk.
EDIT: I wonder if deleteat! might undo the effects of sizehint! at some point.
ok I’m seeing an effect of sizehint! in terms of the overall allocations, but hardly on overall runtime. Potentially the code is at the optimum, or there is something else going on. But thanks for the moment!!
EDIT: And sorry for the noisy title (I’ll add a “no”); I have now julia 1.10 installed in parallel, to test for regressions before asking
You might think you’re deleting at the end but you’re not since Julia is 1-based. When you’re not it has to move around, “to the right” there only 1 item though, and I tested deleting at the front, and it’s as quick. I.e. then it shifts to the left, not right as seen by higher address with:
pointer(a)
I’m looking into push!, it calls Base._growend!(a, 1) and it calls newmem = Base.array_new_memory(mem, newmemlen2) that allocates twice.
I now think confirmed* this is the actual bottleneck in my code; that insert! and deleteat! have to copy data back or forth by one position. This also fits to the observation that sizehint! had little effect on runtime in my case.
Interestingly, the benchmarks below indicate that insert! and deleteat! are slower when the modified indices are in the end, and almost as fast as push! when the array is modified in the starting indices (?).
*EDIT: I replaced insert! and deleteat! with push! and pop! which is not what I want but keeps the overall nature of the code, only removing the effect of the extra copying work of insert! and deleteat!. This showed drastically faster run times of the overall simulation algorithm.
Yes, if the insert! is exactly at the beginning (or if you use pushfirst!), julia will allocate and move everything to the right, but it will do it leaving some empty space before the start of the vector so that further pushfirst!s are fast.
deleteat! at the last position is fast (it doesn’t need to move anything nor allocate). The benchmark is no good because you’re not removing at the end.
Try with @btime deleteat!(a, length(a)) setup=(a=collect(1:60000)) evals=10000
insert! at the last position is also fast (only allocates but doesn’t need to move anything. The benchmark is wrong because it is not inserting at the end after the first eval. Try with: @btime insert!(a, length(a)+1, 0) setup=(a=collect(1:60000)) evals=10000
But I wouldn’t call my benchmark wrong, the choice of my indices were on purpose. I meant “rather at the start” vs. “rather at the end” of the array (not precisely first and last index). Because my simulation draws random indices, the very first and very last index are negiblible for my overall performance.
This behaviour was just interesting to see for me.
The distinction that you’re maybe missing is that index 2 is very near the start for all 10000 evals, whereas index 59999 is near the end at the first eval but not so near at the last eval when there are 70000 elements in the vector. I.e. there is a difference in how much data needs to be moved around in the two benchmarks.
Since moving the data around is a bottleneck you should look for ways to reduce that. How important is it to keep the ordering of your elements? Are there other ways to handle that? What are the actual requirements on the vector?
I apologize, with “wrong” I just meant to say that they weren’t benchmarking what you wrote “[…] the benchmarks below indicate that insert! and deleteat! are slower when the modified indices are in the end […]”. Sorry that it came out unnecessarily negative. I agree, it was an interesting behavior to see and verify.