Performance of a type-unstable accumulator

tk3369 · July 9, 2019, 2:18am

One of the general performance tips is to ensure that an accumulator has the same type as the data that you’re accumulating using the zero function. Apparently, there’s no visible effect in my performance tests below.

Has Julia gotten better with optimization, or is it a case that’s too trivial to bring up the issue? I’m using Julia v1.1.1 on Mac.

julia> random_floats = rand(Float64, 100000);

julia> function double_sum(data)
           total = 0
           for v in data
               total += 2 * v
           end
           return total
       end;

julia> @btime double_sum($random_floats);
  103.515 μs (0 allocations: 0 bytes)

julia> function double_sum(data)
           total = zero(eltype(data))
           for v in data
               total += 2 * v
           end
           return total
       end;

julia> @btime double_sum($random_floats);
  103.505 μs (0 allocations: 0 bytes)

StefanKarpinski · July 9, 2019, 2:25am

Yes, the compiler is pretty good at this these days.

pixel27 · July 9, 2019, 2:31am

random_floats = rand(Float64, 100000);

function double_sum(data)
   total::Float32 = 0
   for v in data
       total += 2 * v
   end
   return total
end;

@btime double_sum($random_floats);

function double_sum(data)
   total = zero(eltype(data))
   for v in data
       total += 2 * v
   end
   return total
end;

@btime double_sum($random_floats);

  360.057 μs (0 allocations: 0 bytes)
  120.063 μs (0 allocations: 0 bytes)

In the first function since you didn’t explicitly set a type for total I’m guessing the compiler looked at the code and made it a Float64 so the performance is the same. When you force total to be a Float32 you see the difference…

tk3369 · July 9, 2019, 2:44am

That’s interesting. Looks like it’s bad because every Float64 has to be converted to Float32 before adding to the total. The overhead of the conversion is big.

More interestingly, if I declare total as Float64 and pass an array of Float32, then it performs well again. I guess the conversion from Float32 to Float64 is much cheaper.

kristoffer.carlsson · July 9, 2019, 5:56am

See Union-splitting: what it is, and why you should care

StefanKarpinski · July 10, 2019, 3:27am

Converting Float23 to Float64 is mostly just inserting some zero bits here and there. Converting from Float64 to Float32 require rounding and checking for overflow, etc.

StefanKarpinski · July 10, 2019, 3:28am

A couple of things to try to check that:

Call double_sum on an empty Float64 vector, see what it returns.
Look at the LLVM code or machine code with @code_llvm and @code_native.

Topic		Replies	Views
Memory on array element assignment Performance	10	423	August 3, 2022
Mysterious type instability (& performance hit) with simple @threads General Usage performance	2	120	July 24, 2024
Performance of Float32 exponential Performance	4	1441	December 21, 2019
Float64 is typecasted to float16 New to Julia type	8	873	July 23, 2020
What exactly is "allocation" in Julia? Performance question , memory-allocation	45	5619	November 4, 2022

Performance of a type-unstable accumulator

Related topics