# Dot vs. array comprehension vs. loop

Hello! New Julia user here.
Following this post, I try to learn the way to speed up sums / integrals. I am doing the following test:

``````using BenchmarkTools

xvec = range(0, 4, length=1000)

function fdot()

function _f1(x)
ans = 1/(1+x^2)
return ans
end

ans = sum(@. xvec * _f1(xvec))
return ans
end

function fcompre()

function _f2(x)
ans = 1/(1+x^2)
return ans
end

ans = sum(x * _f2(x) for x in xvec)
return ans
end

function floop()

function _f3(x)
ans = 1/(1+x^2)
return ans
end

ans = 0.0
for x in xvec
ans += x * _f3(x)
end
return ans
end

@btime \$fdot()
@btime \$fcompre()
@btime \$floop()
``````

and get the output

`````` 1.249 ms (4628 allocations: 133.45 KiB)
94.327 μs (4002 allocations: 62.58 KiB)
183.405 μs (6491 allocations: 117.05 KiB)
``````

I am not sure why there are such big differences in the performance. Thank you for your help!

First, you are breaking the very first rule in the performance tips: you are using a global variable `xvec`, rather than passing it as a parameter.

Second, you inner functions `_f1(x)` etcetera are referring to the variable `ans`, and Julia thinks that this is referring to the variable `ans` in the surrounding scope, which is not what you want (and is leading to unwanted allocations).

A better comparison would be:

``````function fdot(xvec)
_f1(x) = 1/(1+x^2)
return sum(@. xvec * _f1(xvec))
end
function fcompre(xvec)
_f2(x) = 1/(1+x^2)
return sum(x * _f2(x) for x in xvec)
end
function floop(xvec)
_f3(x) = 1/(1+x^2)
ans = 0.0
for x in xvec
ans += x * _f3(x)
end
return ans
end
``````

from which I get:

``````julia> @btime fdot(\$xvec); @btime fcompre(\$xvec); @btime floop(\$xvec);
3.776 μs (1 allocation: 7.94 KiB)
3.488 μs (0 allocations: 0 bytes)
3.402 μs (0 allocations: 0 bytes)
``````

Note that `fdot` is slower because it allocates a temporary vector (to store `@. xvec * _f1(xvec)`) before calling `sum`.

I try to learn the way to speed up sums / integrals.

Note also that if you are computing integrals, you can generally do much better by using a quadrature scheme based on unequally-spaced points, e.g. via the QuadGK.jl package.

12 Likes

Thank you for the explanation. May be what I try to understand is the huge difference in choosing one of the option to perform the sum:

``````function test2()

function fpick()

xvec = range(0.0, 4.0, length=1000)
function _f1(x)
ans = 1/(1+x^2)
return ans
end

# pick one
ans = sum(@. xvec * _f1(xvec))
#ans = sum(x * _f1(x) for x in xvec)
#=
ans = 0.0
for x in xvec
ans += x * _f1(x)
end
=#
return ans
end

fpick()
end
``````

and your answer for the need to temporary store a vector for @. helps.

For integrals, I am using Gaussian Legendre from FastGaussQuadrature.jl, and is surprised to find

``````xvec, xweight = gausslegendre(N)

# pick one
ans = sum(@. xweight * _f(xvec) )
ans = sum( xw * _f(x) for (x, xw) in zip(xvec, xweight))
``````

gives quite different performance in some (other) integrals I have. Thank you again!

You appear to have missed an important point of what @stevengj was telling you re scope. Note the improvement in speed.

``````using BenchmarkTools

function test2()

function fpick()

xvec = range(0.0, 4.0, length=1000)
function _f1(x)
local ans = 1/(1+x^2)
return ans
end

# pick one
local ans = sum(@. xvec * _f1(xvec))
#ans = sum(x * _f1(x) for x in xvec)
#=
ans = 0.0
for x in xvec
ans += x * _f1(x)
end
=#
return ans
end

fpick()
end

function test3()

function fpick()

xvec = range(0.0, 4.0, length=1000)
function _f1(x)
local ans = 1/(1+x^2)
return ans
end

# pick one
# local ans = sum(@. xvec * _f1(xvec))
local ans = sum(x * _f1(x) for x in xvec)
#=
ans = 0.0
for x in xvec
ans += x * _f1(x)
end
=#
return ans
end

fpick()
end
``````
2 Likes

Thank you. Indeed I have not understood that point. Now I do! I thought (wrongly) that “ans” inside are unaffected by “ans” outside.
[what I think is still true: “ans” inside _f1, despite the “ans” outside, should be local since it is an assignment (without the global flag), and will not modify the “ans” outside. The functions will get the right answer, but using the same names seems to make it necessary to do a lot of pointless allocations, and that slow things down. ]

And indeed simply renaming / adding local for “ans” inside function speed up the all options. Thank you!

1 Like