abs(::Int32) slower than abs(::Int)

Hi,

I have a need for a lot of large integer vectors, so have attempted to make use of Int32 rather than Int to minimise the memory footprint. As I need to ensure the values are positive, I call abs(i) on each element during iteration.

Before pursuing a change to my code, I did a quick (repeatable) timing check:

     i :: Int32 = -1
    @btime abs(i) # 2.300 ns (0 allocations: 0 bytes)
    @btime abs(-1) # 1.000 ns (0 allocations: 0 bytes)

I’m sure that there is a good reason for this, but is there any way to mitigate?

Regards,

Interpolate in your benchmarks as the BenchmarkTools docs suggest?

julia> i32 = Int32(-1)
-1

julia> i64 = -1
-1

julia> @btime abs($i64)
  1.500 ns (0 allocations: 0 bytes)
1

julia> @btime abs($i32)
  1.900 ns (0 allocations: 0 bytes)
1

julia> @btime abs($i32)
  1.500 ns (0 allocations: 0 bytes)
1

also benchmarks in the <2ns category aren’t super reliable, try something closer to your actual use case maybe like

julia> x32 = rand(Int32, 100_000); x64 = Int.(rand(Int32, 100_000));

julia> @btime abs.($x32);
  23.200 μs (2 allocations: 390.67 KiB)

julia> @btime abs.($x64);
  53.000 μs (2 allocations: 781.30 KiB)
4 Likes

There’s probably a difference in allocation time here. Perhaps pre-allocate the output vectors to better isolate the time of the abs?

There’s also a factor of 2 difference in simd effect, which is not relevant for scalar abs. (simd may actually be relevant, per the OP).

1 Like

Sorry I maybe I should have left it at “actual use case” and not made up one potential (and likely irrelevant) example… My point was mainly that fluctuations of <1ns in benchmarks are unlikely to be informative of impacts on real world code with perceivable runtimes.

An allocation free benchmark:

julia> function f(x)
           for i ∈ eachindex(x)
               x[i] = abs(x[i])
           end
           return x
       end;

julia> @btime f($x32);
  4.543 μs (0 allocations: 0 bytes)

julia> @btime f($x64);
  9.000 μs (0 allocations: 0 bytes)
3 Likes

You may need Ref interpolation:

julia> using BenchmarkTools

julia> @btime abs($(Ref(Int64(-1)))[]);
  1.500 ns (0 allocations: 0 bytes)

julia> @btime abs($(Ref(Int32(-1)))[]);
  1.500 ns (0 allocations: 0 bytes)

to prevent the compiler from evaluating the whole expression statically, as explained in the BenchmarkTools manual.

1 Like

Thanks so much for the extra detail - incredibly helpful.

So essentially, even though I can still repeat my original results (without interpolation), this is purely as a result of how I was benchmarking a value?

Why is it that the single evaluation is comparable (in your test) but the loops are different (in Nils?)

Loops over sufficiently large arrays will be limited by the speed of memory access, not by the abs function. With Int64 there is twice as much memory to access compared to an Int32 array of the same length.

3 Likes

Also, AVX2 has instructions for computing the absolute value of 8x32-bit integers, but not 4x64-bit integers. So the latter is computed using a subtraction and a blend instruction, which is slower.
(memory may still be the bottleneck in reality, not sure)

3 Likes