One of the general performance tips is to ensure that an accumulator has the same type as the data that you’re accumulating using the zero
function. Apparently, there’s no visible effect in my performance tests below.
Has Julia gotten better with optimization, or is it a case that’s too trivial to bring up the issue? I’m using Julia v1.1.1 on Mac.
julia> random_floats = rand(Float64, 100000);
julia> function double_sum(data)
total = 0
for v in data
total += 2 * v
end
return total
end;
julia> @btime double_sum($random_floats);
103.515 μs (0 allocations: 0 bytes)
julia> function double_sum(data)
total = zero(eltype(data))
for v in data
total += 2 * v
end
return total
end;
julia> @btime double_sum($random_floats);
103.505 μs (0 allocations: 0 bytes)
1 Like
Yes, the compiler is pretty good at this these days.
2 Likes
random_floats = rand(Float64, 100000);
function double_sum(data)
total::Float32 = 0
for v in data
total += 2 * v
end
return total
end;
@btime double_sum($random_floats);
function double_sum(data)
total = zero(eltype(data))
for v in data
total += 2 * v
end
return total
end;
@btime double_sum($random_floats);
360.057 μs (0 allocations: 0 bytes)
120.063 μs (0 allocations: 0 bytes)
In the first function since you didn’t explicitly set a type for total
I’m guessing the compiler looked at the code and made it a Float64 so the performance is the same. When you force total to be a Float32 you see the difference…
That’s interesting. Looks like it’s bad because every Float64 has to be converted to Float32 before adding to the total. The overhead of the conversion is big.
More interestingly, if I declare total
as Float64 and pass an array of Float32, then it performs well again. I guess the conversion from Float32 to Float64 is much cheaper.
Converting Float23 to Float64 is mostly just inserting some zero bits here and there. Converting from Float64 to Float32 require rounding and checking for overflow, etc.
A couple of things to try to check that:
- Call
double_sum
on an empty Float64
vector, see what it returns.
- Look at the LLVM code or machine code with
@code_llvm
and @code_native
.
1 Like