I have a need for a lot of large integer vectors, so have attempted to make use of Int32 rather than Int to minimise the memory footprint. As I need to ensure the values are positive, I call abs(i) on each element during iteration.
Before pursuing a change to my code, I did a quick (repeatable) timing check:
Sorry I maybe I should have left it at “actual use case” and not made up one potential (and likely irrelevant) example… My point was mainly that fluctuations of <1ns in benchmarks are unlikely to be informative of impacts on real world code with perceivable runtimes.
An allocation free benchmark:
julia> function f(x)
for i ∈ eachindex(x)
x[i] = abs(x[i])
end
return x
end;
julia> @btime f($x32);
4.543 μs (0 allocations: 0 bytes)
julia> @btime f($x64);
9.000 μs (0 allocations: 0 bytes)
Thanks so much for the extra detail - incredibly helpful.
So essentially, even though I can still repeat my original results (without interpolation), this is purely as a result of how I was benchmarking a value?
Why is it that the single evaluation is comparable (in your test) but the loops are different (in Nils?)
Loops over sufficiently large arrays will be limited by the speed of memory access, not by the abs function. With Int64 there is twice as much memory to access compared to an Int32 array of the same length.
Also, AVX2 has instructions for computing the absolute value of 8x32-bit integers, but not 4x64-bit integers. So the latter is computed using a subtraction and a blend instruction, which is slower.
(memory may still be the bottleneck in reality, not sure)