abs(::Int32) slower than abs(::Int)

djholiver · January 19, 2024, 2:56pm

Hi,

I have a need for a lot of large integer vectors, so have attempted to make use of Int32 rather than Int to minimise the memory footprint. As I need to ensure the values are positive, I call abs(i) on each element during iteration.

Before pursuing a change to my code, I did a quick (repeatable) timing check:

     i :: Int32 = -1
    @btime abs(i) # 2.300 ns (0 allocations: 0 bytes)
    @btime abs(-1) # 1.000 ns (0 allocations: 0 bytes)

I’m sure that there is a good reason for this, but is there any way to mitigate?

Regards,

nilshg · January 19, 2024, 3:13pm

Interpolate in your benchmarks as the BenchmarkTools docs suggest?

julia> i32 = Int32(-1)
-1

julia> i64 = -1
-1

julia> @btime abs($i64)
  1.500 ns (0 allocations: 0 bytes)
1

julia> @btime abs($i32)
  1.900 ns (0 allocations: 0 bytes)
1

julia> @btime abs($i32)
  1.500 ns (0 allocations: 0 bytes)
1

also benchmarks in the <2ns category aren’t super reliable, try something closer to your actual use case maybe like

julia> x32 = rand(Int32, 100_000); x64 = Int.(rand(Int32, 100_000));

julia> @btime abs.($x32);
  23.200 μs (2 allocations: 390.67 KiB)

julia> @btime abs.($x64);
  53.000 μs (2 allocations: 781.30 KiB)

DNF · January 19, 2024, 3:17pm

There’s probably a difference in allocation time here. Perhaps pre-allocate the output vectors to better isolate the time of the abs?

~~There’s also a factor of 2 difference in simd effect, which is not relevant for scalar abs.~~ (simd may actually be relevant, per the OP).

nilshg · January 19, 2024, 3:35pm

Sorry I maybe I should have left it at “actual use case” and not made up one potential (and likely irrelevant) example… My point was mainly that fluctuations of <1ns in benchmarks are unlikely to be informative of impacts on real world code with perceivable runtimes.

An allocation free benchmark:

julia> function f(x)
           for i ∈ eachindex(x)
               x[i] = abs(x[i])
           end
           return x
       end;

julia> @btime f($x32);
  4.543 μs (0 allocations: 0 bytes)

julia> @btime f($x64);
  9.000 μs (0 allocations: 0 bytes)

stevengj · January 19, 2024, 7:31pm

You may need Ref interpolation:

julia> using BenchmarkTools

julia> @btime abs($(Ref(Int64(-1)))[]);
  1.500 ns (0 allocations: 0 bytes)

julia> @btime abs($(Ref(Int32(-1)))[]);
  1.500 ns (0 allocations: 0 bytes)

to prevent the compiler from evaluating the whole expression statically, as explained in the BenchmarkTools manual.

djholiver · January 19, 2024, 8:07pm

Thanks so much for the extra detail - incredibly helpful.

So essentially, even though I can still repeat my original results (without interpolation), this is purely as a result of how I was benchmarking a value?

Why is it that the single evaluation is comparable (in your test) but the loops are different (in Nils?)

stevengj · January 19, 2024, 8:11pm

Loops over sufficiently large arrays will be limited by the speed of memory access, not by the abs function. With Int64 there is twice as much memory to access compared to an Int32 array of the same length.

jakobnissen · January 19, 2024, 8:18pm

Also, AVX2 has instructions for computing the absolute value of 8x32-bit integers, but not 4x64-bit integers. So the latter is computed using a subtraction and a blend instruction, which is slower.
(memory may still be the bottleneck in reality, not sure)

Topic		Replies	Views
What is faster, abs(x) or x^2? Performance	15	5791	August 7, 2020
abs2(Float64) takes longer than abs2(Complex64)? General Usage performance	2	637	February 1, 2017
Module to use Int32 General Usage	14	1006	August 4, 2020
Sqrt(abs(x)) is even faster than sqrt! Performance array	11	1597	March 29, 2021
Question about abs() General Usage integer-overflow	6	1355	April 6, 2018

abs(::Int32) slower than abs(::Int)

Related topics