One of the general performance tips is to ensure that an accumulator has the same type as the data that you’re accumulating using the zero function. Apparently, there’s no visible effect in my performance tests below.

Has Julia gotten better with optimization, or is it a case that’s too trivial to bring up the issue? I’m using Julia v1.1.1 on Mac.

julia> random_floats = rand(Float64, 100000);
julia> function double_sum(data)
total = 0
for v in data
total += 2 * v
end
return total
end;
julia> @btime double_sum($random_floats);
103.515 μs (0 allocations: 0 bytes)
julia> function double_sum(data)
total = zero(eltype(data))
for v in data
total += 2 * v
end
return total
end;
julia> @btime double_sum($random_floats);
103.505 μs (0 allocations: 0 bytes)

random_floats = rand(Float64, 100000);
function double_sum(data)
total::Float32 = 0
for v in data
total += 2 * v
end
return total
end;
@btime double_sum($random_floats);
function double_sum(data)
total = zero(eltype(data))
for v in data
total += 2 * v
end
return total
end;
@btime double_sum($random_floats);

In the first function since you didn’t explicitly set a type for total I’m guessing the compiler looked at the code and made it a Float64 so the performance is the same. When you force total to be a Float32 you see the difference…

That’s interesting. Looks like it’s bad because every Float64 has to be converted to Float32 before adding to the total. The overhead of the conversion is big.

More interestingly, if I declare total as Float64 and pass an array of Float32, then it performs well again. I guess the conversion from Float32 to Float64 is much cheaper.

Converting Float23 to Float64 is mostly just inserting some zero bits here and there. Converting from Float64 to Float32 require rounding and checking for overflow, etc.