Dot vs. array comprehension vs. loop

pmlo · July 4, 2021, 7:50pm

Hello! New Julia user here.
Following this post, I try to learn the way to speed up sums / integrals. I am doing the following test:

using BenchmarkTools

xvec = range(0, 4, length=1000)

function fdot()

    function _f1(x)
        ans = 1/(1+x^2)
        return ans
    end

    ans = sum(@. xvec * _f1(xvec))
    return ans
end

function fcompre()

    function _f2(x)
        ans = 1/(1+x^2)
        return ans
    end

    ans = sum(x * _f2(x) for x in xvec)
    return ans
end

function floop()

    function _f3(x)
        ans = 1/(1+x^2)
        return ans
    end

    ans = 0.0
    for x in xvec
        ans += x * _f3(x)
    end
    return ans
end

@btime $fdot()
@btime $fcompre()
@btime $floop()

and get the output

 1.249 ms (4628 allocations: 133.45 KiB)
  94.327 μs (4002 allocations: 62.58 KiB)
  183.405 μs (6491 allocations: 117.05 KiB)

I am not sure why there are such big differences in the performance. Thank you for your help!

stevengj · July 4, 2021, 8:23pm

First, you are breaking the very first rule in the performance tips: you are using a global variable xvec, rather than passing it as a parameter.

Second, you inner functions _f1(x) etcetera are referring to the variable ans, and Julia thinks that this is referring to the variable ans in the surrounding scope, which is not what you want (and is leading to unwanted allocations).

A better comparison would be:

function fdot(xvec)
    _f1(x) = 1/(1+x^2)
    return sum(@. xvec * _f1(xvec))
end
function fcompre(xvec)
    _f2(x) = 1/(1+x^2)
    return sum(x * _f2(x) for x in xvec)
end
function floop(xvec)
    _f3(x) = 1/(1+x^2)
    ans = 0.0
    for x in xvec
        ans += x * _f3(x)
    end
    return ans
end

from which I get:

julia> @btime fdot($xvec); @btime fcompre($xvec); @btime floop($xvec);
  3.776 μs (1 allocation: 7.94 KiB)
  3.488 μs (0 allocations: 0 bytes)
  3.402 μs (0 allocations: 0 bytes)

Note that fdot is slower because it allocates a temporary vector (to store @. xvec * _f1(xvec)) before calling sum.

I try to learn the way to speed up sums / integrals.

Note also that if you are computing integrals, you can generally do much better by using a quadrature scheme based on unequally-spaced points, e.g. via the QuadGK.jl package.

pmlo · July 4, 2021, 9:23pm

Thank you for the explanation. May be what I try to understand is the huge difference in choosing one of the option to perform the sum:

function test2()

    function fpick()

        xvec = range(0.0, 4.0, length=1000)
        function _f1(x)
            ans = 1/(1+x^2)
            return ans
        end

        # pick one
        ans = sum(@. xvec * _f1(xvec))
        #ans = sum(x * _f1(x) for x in xvec)
        #=
        ans = 0.0
        for x in xvec
            ans += x * _f1(x)
        end
        =#
        return ans
    end

    fpick()
end

and your answer for the need to temporary store a vector for @. helps.

For integrals, I am using Gaussian Legendre from FastGaussQuadrature.jl, and is surprised to find

xvec, xweight = gausslegendre(N)

# pick one
ans = sum(@. xweight * _f(xvec) )
ans = sum( xw * _f(x) for (x, xw) in zip(xvec, xweight))

gives quite different performance in some (other) integrals I have. Thank you again!

Joris_Pinkse · July 4, 2021, 10:26pm

You appear to have missed an important point of what @stevengj was telling you re scope. Note the improvement in speed.

using BenchmarkTools

function test2()

    function fpick()

        xvec = range(0.0, 4.0, length=1000)
        function _f1(x)
            local ans = 1/(1+x^2)
            return ans
        end

        # pick one
        local ans = sum(@. xvec * _f1(xvec))
        #ans = sum(x * _f1(x) for x in xvec)
        #=
        ans = 0.0
        for x in xvec
            ans += x * _f1(x)
        end
        =#
        return ans
    end

    fpick()
end


function test3()

    function fpick()

        xvec = range(0.0, 4.0, length=1000)
        function _f1(x)
            local ans = 1/(1+x^2)
            return ans
        end

        # pick one
        # local ans = sum(@. xvec * _f1(xvec))
        local ans = sum(x * _f1(x) for x in xvec)
        #=
        ans = 0.0
        for x in xvec
            ans += x * _f1(x)
        end
        =#
        return ans
    end

    fpick()
end

pmlo · July 4, 2021, 10:56pm

Thank you. Indeed I have not understood that point. Now I do! I thought (wrongly) that “ans” inside are unaffected by “ans” outside.
[what I think is still true: “ans” inside _f1, despite the “ans” outside, should be local since it is an assignment (without the global flag), and will not modify the “ans” outside. The functions will get the right answer, but using the same names seems to make it necessary to do a lot of pointless allocations, and that slow things down. ]

And indeed simply renaming / adding local for “ans” inside function speed up the all options. Thank you!