The columns and the rows - puzzled about performance

Niceno · October 14, 2021, 10:46am

Dear all,

I have been using Julia for about three weeks now and, to my understanding, she uses a column-major matrix storage system. That would also mean, if I get things right, that loops which access matrix entries in a way that inner-most loops browse over columns (second index, right?) should be more efficient than the other way around.

However, that is not what I observe in my numerical experiments. This fraction of the code shows a typical construct I use in my Julia program:

NC = 40000000

bu = ones(3,NC);  fc = ones(3,NC);  fu = ones(3,NC)
γ1 = 0.73;  γ2 = 1.0 -γ1

function F(bu :: Matrix{Float64},
           fc :: Matrix{Float64},
           fu :: Matrix{Float64},
           γ1 :: Float64,
           γ2 :: Float64,
           NT :: Int64)
  @time for i = 1 : NT
    for j = 1 : 3
      bu[j,i] = bu[j,i] + γ1 * fc[j,i] + γ2 * fu[j,i]
    end
  end
end

println("Invoke with: F(bu, fc, fu, γ1, γ2, NC)")

Which gives me times in the range of 0.24 seconds. The above fraction of the code uses inner-most loop j to browse over rows (first index) so it should be inefficient. However, if I rewrite this piece of code so that inner-most loop browses over columns (second index) like this:

NC = 40000000

bu = ones(NC,3);  fc = ones(NC,3);  fu = ones(NC,3)
γ1 = 0.73;  γ2 = 1.0 -γ1

function F(bu :: Matrix{Float64},
           fc :: Matrix{Float64},
           fu :: Matrix{Float64},
           γ1 :: Float64,
           γ2 :: Float64,
           NT :: Int64)
  @time for i = 1 : NT
    for j = 1 : 3
      bu[i,j] = bu[i,j] + γ1 * fc[i,j] + γ2 * fu[i,j]
    end
  end
end

println("Invoke with: F(bu, fc, fu, γ1, γ2, NC)")

I am consistently slower by some 20%, i.e. the execution times are in the order of 0.3 s.

What is going on here? Am I confusing columns with rows?

Please let me know what you think?

Cheers

rafael.guerra · October 14, 2021, 10:53am

Linking the corresponding manual section.

Niceno · October 14, 2021, 11:02am

Thanks. So, if the construct:

  for col = inds, row = inds
    out[row, col] = x[row]
  end

is equivalent to:

  for col = inds
    for row = inds
      out[row, col] = x[row]
    end
  end

which is more efficient way to browse, and also consistent with what I observe, the inner-most loop should go over rows, not over columns, right?

nilshg · October 14, 2021, 11:14am

Unless I always misunderstood it, that’s what column-major implies: a column is stored contiguously in memory, so you want to have subsequent iterations of your loop access elements in the same column as much as possible. You do this by going row-by-row down the first column, then moving to the second column (which is not contiguous in memory to the first column) and going through all elements in that column row-by-row, and so on.

Niceno · October 14, 2021, 11:16am

Neatly explained nilshg, now I get a clear picture

LaurentPlagne · October 14, 2021, 11:25am

Notice that the type annotation does not help for performance (while @inbounds may)

NC = 40000000

bu = ones(3, NC);  fc = ones(3, NC);  fu = ones(3, NC)
γ1 = 0.73;  γ2 = 1.0 - γ1

function F(bu, fc, fu, γ1, γ2)
    nc,nr=size(bu)
    @inbounds for j = 1:nc
        for i = 1:nr
            bu[i,j] = bu[i,j] + γ1 * fc[i,j] + γ2 * fu[i,j]
        end
    end
end


@time  F(bu, fc, fu, γ1, γ2)
@time  F(bu, fc, fu, γ1, γ2)
@time  F(bu, fc, fu, γ1, γ2)

println("Invoke with: F(bu, fc, fu, γ1, γ2)")

Niceno · October 14, 2021, 11:28am

Thanks for the tip!

zdenek_hurak · October 14, 2021, 11:42am

Although it seems already sufficiently clarified, you may perhaps still want to have a look at my take Why column major? - #36 by zdenek_hurak

rafael.guerra · October 14, 2021, 12:05pm

For the record, the Wikipedia article Row- and column-major order is a valuable resource on this topic too.

Topic		Replies	Views
Why column major? General Usage question , array , linearalgebra , column-major	59	19302	February 17, 2024
Performance difference when accessing square matrix rows-first or cols-first Performance	14	1733	April 13, 2021
No significant performance difference of row- vs column-wise loops New to Julia memory-allocation , loops	2	1508	July 15, 2019
Effect of Column Major Storage on Multiplication Performance	3	446	November 7, 2020
What are the pros and cons of row/column major ordering? Internals & Design arrays , column-major , row-major	10	3207	February 13, 2024

The columns and the rows - puzzled about performance

Related topics