Performance of a type-unstable accumulator

One of the general performance tips is to ensure that an accumulator has the same type as the data that you’re accumulating using the zero function. Apparently, there’s no visible effect in my performance tests below.

Has Julia gotten better with optimization, or is it a case that’s too trivial to bring up the issue? I’m using Julia v1.1.1 on Mac.

julia> random_floats = rand(Float64, 100000);

julia> function double_sum(data)
           total = 0
           for v in data
               total += 2 * v
           end
           return total
       end;

julia> @btime double_sum($random_floats);
  103.515 μs (0 allocations: 0 bytes)

julia> function double_sum(data)
           total = zero(eltype(data))
           for v in data
               total += 2 * v
           end
           return total
       end;

julia> @btime double_sum($random_floats);
  103.505 μs (0 allocations: 0 bytes)

1 Like

Yes, the compiler is pretty good at this these days.

2 Likes
random_floats = rand(Float64, 100000);

function double_sum(data)
   total::Float32 = 0
   for v in data
       total += 2 * v
   end
   return total
end;

@btime double_sum($random_floats);

function double_sum(data)
   total = zero(eltype(data))
   for v in data
       total += 2 * v
   end
   return total
end;

@btime double_sum($random_floats);
  360.057 μs (0 allocations: 0 bytes)
  120.063 μs (0 allocations: 0 bytes)

In the first function since you didn’t explicitly set a type for total I’m guessing the compiler looked at the code and made it a Float64 so the performance is the same. When you force total to be a Float32 you see the difference…

That’s interesting. Looks like it’s bad because every Float64 has to be converted to Float32 before adding to the total. The overhead of the conversion is big.

More interestingly, if I declare total as Float64 and pass an array of Float32, then it performs well again. I guess the conversion from Float32 to Float64 is much cheaper.

See https://julialang.org/blog/2018/08/union-splitting

1 Like

Converting Float23 to Float64 is mostly just inserting some zero bits here and there. Converting from Float64 to Float32 require rounding and checking for overflow, etc.

A couple of things to try to check that:

  1. Call double_sum on an empty Float64 vector, see what it returns.
  2. Look at the LLVM code or machine code with @code_llvm and @code_native.
1 Like