Why is this Julia code considerably slower than Matlab

Loop fusion will actually be slower here. This is clearer if we simplify the code a bit. Suppose we are doing X .= colvec .* cis.(rowvec) (i.e. combining a column vector and a row vector to make a matrix). This is lowered to broadcast!((x,y) -> x * cis(y), X, colvec, rowvec), which is essentially equivalent to:

for j=1:length(rowvec), i=1:length(colvec)
    X[i,j] = colvec[i] * cis(rowvec[j])
end

The problem here is that if X is m×n, then we end up calling the cis function mn times.

If, instead, we use

tmp = cis.(rowvec)
X .= colvec .* tmp

it only calls the cis function n times, at the expense of requiring more storage.

This is the sort of space/time tradeoff that it is hard to automate. As a rule of thumb, however, if you are doing a broadcast operation combining a row and a column vector, it will be faster if you do any expensive operations on the row and column vector before doing the broadcast combination.

In this particular case, I would suggest doing something like:

function test_perf5()
    rangeᵀ = (1:2000000)'
    rowtemp = similar(Complex128, rangeᵀ)
    steering_vectors = complex.(ones(4,11), ones(4,11)) # placeholder for actual vectors?

    sum_signal = zeros(Complex{Float64}, 4, length(rangeᵀ))
    for i = 1:11
        rowtemp .= cis.(1.6069246423111792 .* rangeᵀ .+ 0.6981317007977318)
        sum_signal .+= steering_vectors[:,i] .* rowtemp
    end
    return sum_signal
  end

You could also maybe try @views, or simply allocate steering_vectors as an array of vectors, to avoid allocating a copy in steering_vectors[:,i]. You can also use @. before the for and then omit all of the dots.

9 Likes