Optimizing function to evaluate elementary symmetric polynomials

Hi,
I am working on optimizing the following function:

@inbounds function get_symmetric_polynomials!(dest::Matrix{Complex{MultiFloat{Float64, 2}}}, roots::Matrix{Complex{MultiFloat{Float64, 2}}}, b::Int64)

    if b == 0
        dest[1, :] .= one(Complex{MultiFloat{Float64, 2}})
        return
    elseif b == 1
        dest[1, :] .= one(Complex{MultiFloat{Float64, 2}})
        dest[2, :] .= sum.(eachcol(roots))
        return
    end

    dest[1, :] .= one(Complex{MultiFloat{Float64, 2}})
    dest[2:end, :] .= zero(Complex{MultiFloat{Float64, 2}})

    for i in axes(roots, 1)
        for j in min(i, b):-1:1
            dest[j+1, :] .+= roots[i, :] .* dest[j, :]
        end
    end

    return
end

This function calculates elementary symmetric polynomials up to a given order b from a vector of complex numbers (roots). Specifically, given a vector X = {x_{1}, x_{2}, ..., x_{N}} and an integer b, it computes e_{a} = \sum_{i_{1} < i_{2} < \dots < i_{a+1}=1}^{N}x_{i_{1}}x_{i_{2}}\dots x_{i_{a}} for 0 \leq a \leq b.

Any suggestions on how to optimize this function further would be greatly appreciated.

I was first using the complex numbers from DoubleFloats.jl (using ComplexF64 leads to loss of precision for the sort of roots I am interested in, but in my testing, the precision offered by DoubleFloats is sufficient for the range of b I am interested in, which is for b up to 25), but then using MultiFloats.jl gives a nice speed up.

When I profile my code, I see that about half of the time is spent in multidimensional.jl _getindex. I am not sure what that is.

Any suggestions on how to optimize this function further would be greatly appreciated.

Two quick observations:

This allocates for root[i,:] and dest[j,:] because slices copy in Julia. The former is also constant for the inner loop. So try instead

    @views for i in axes(roots, 1)
        tmp = roots[i, :]
        for j in min(i, b):-1:1
            dest[j+1, :] .+= temp .* dest[j, :]
        end
    end

You order of indices should be suboptimal since Julia Arrays are column-major, which means you should iterate the left-most index in the inner most loop. In your code you have written a function that takes Matrix inputs but conceptually it is a operation on vectors. You kinda baked in a form of vectorization. I would consider this an antipattern in Julia. And note that this antipattern is responsible for the issue with the index order in this case.
So I would recommend you to either switch the indices on your matrices, i.e. transpose them effectively, or rewrite this function to take in Vectors instead and perform a broadcast over the different instances at a higher level in the code.
If you come from Matlab or similar: Note that in Julia you don’t need to vectorize computations for performances. A for loop has the same speed as broadcasting.

Also another comment on style: You don’t need to type annotate the function arguments. You can write this function completely generically, which makes it a lot easier to test difference number types :slight_smile: you are also almost there already since you use zero and one. Just drop the type annotations from the signature and use eltype(dest) where you hardcoded the types in the body.

Thanks for the help! I originally had a function that operated on vectors instead of matrices, and after reverting to that approach, I found it to be faster (I broadcast over the columns of dest and roots). I’ve also modified the function to work with matrices:

@inbounds function get_symmetric_polynomials!(dest::Matrix{Complex{MultiFloat{Float64, 2}}}, roots::Matrix{Complex{MultiFloat{Float64, 2}}}, b::Int64)

    dest[:, 1] .= one(Complex{MultiFloat{Float64, 2}})
    
    if b == 0
        return
    elseif b == 1
        dest[:, 2] .= sum(roots; dims=2)
        return
    end

    dest[:, 2:end] .= zero(Complex{MultiFloat{Float64, 2}})

    @views for i in axes(roots, 2)
        tmp = roots[:, i]
        for j in min(i, b):-1:1
            dest[:, j+1] .+= tmp .* dest[:, j]
        end
    end

    return
end

This matrix-based function performs just as well as column-wise broadcasting. I use type annotations mainly for debugging, but I will keep it in mind.