There has been some discussion on several fora on the speed of broadcasting vs. a for loop - some aspects escape me. The output of the Julia script below:
Explicit for loop:
1.310 μs (0 allocations: 0 bytes)
1.309 μs (0 allocations: 0 bytes)
1.305 μs (0 allocations: 0 bytes)
Broadcasting:
8.831 μs (2 allocations: 78.17 KiB)
8.599 μs (2 allocations: 78.17 KiB)
8.336 μs (2 allocations: 78.17 KiB)
Broadcasting inbounds:
8.062 μs (2 allocations: 78.17 KiB)
8.676 μs (2 allocations: 78.17 KiB)
9.170 μs (2 allocations: 78.17 KiB)
In this simple example, the broadcasting causes two allocations and a performance penalty of a factor of 6. I have tried with the inbounds macro, but I guess the allocations are the problem. The number of allocations vary with the size of v
.
So once again, why is the broadcasting so much slower than the for loop? And why does the number of allocations vary with the size of v
?
#!/usr/bin/env julia
import BenchmarkTools
function explicit_for_loop(v::Vector{Float64})::Vector{Float64}
for (index, value) in enumerate(v)
v[index] = value^2
end
return v
end
function broadcasting(v::Vector{Float64})::Vector{Float64}
v = v.^2
return v
end
function broadcasting_inbounds(v::Vector{Float64})::Vector{Float64}
@inbounds v = v.^2
return v
end
function reset()::Vector{Float64}
v::Vector{Float64} = [ x for x in range(1, 10000)]
return v
end
v::Vector{Float64} = reset()
println("Explicit: ", explicit_for_loop(v))
v = reset()
println("Broadcast: ", broadcasting(v))
v = reset()
println("Broadcast: ", broadcasting_inbounds(v))
println("Explicit for loop:")
v = reset()
BenchmarkTools.@btime explicit_for_loop(v)
v = reset()
BenchmarkTools.@btime explicit_for_loop(v)
v = reset()
BenchmarkTools.@btime explicit_for_loop(v)
println("Broadcasting:")
v = reset()
BenchmarkTools.@btime broadcasting(v)
v = reset()
BenchmarkTools.@btime broadcasting(v)
v = reset()
BenchmarkTools.@btime broadcasting(v)
println("Broadcasting inbounds:")
v = reset()
BenchmarkTools.@btime broadcasting_inbounds(v)
v = reset()
BenchmarkTools.@btime broadcasting_inbounds(v)
v = reset()
BenchmarkTools.@btime broadcasting_inbounds(v)