Puzzled by the difference in performance when summing an array in different ways

Nicolas_Gagne · July 19, 2019, 5:16pm

Hi everybody,

While trying to test my understanding of “views vs slice of arrays” and “row-major vs column major”, I time (using @time) the following functions :

A = rand(10000,10000);
function sum_1(x)
    counter = 0.0
    for i in axes(x,2)
        counter += sum(x[:,i])
    end
    return counter
end

function sum_2(x)
    counter = 0.0
    for i in axes(x,2)
        counter += sum(view(x,:,i))
    end
    return counter
end

function sum_3(x)
    counter = 0.0
    for i in axes(A,2)
        counter += sum(view(x,i,:))
    end
    return counter
end

function sum_4(x)
    counter = 0.0
    for I in CartesianIndices(x)
        counter += x[I]
    end
end

function sum_5(x)
    counter = 0.0
    for i in x
        counter += i
    end
    return counter
end

function sum_6(x)
    return sum(x)
end

I was surprised to consistently observe the following time and memory allocation:
0.485504 seconds (88.47 k allocations: 764.900 MiB, 18.79% gc time)
0.044661 seconds (78.47 k allocations: 1.655 MiB)
0.015908 seconds (68.98 k allocations: 1.510 MiB)
0.110523 seconds (4 allocations: 160 bytes)
0.117049 seconds (5 allocations: 176 bytes)
0.043822 seconds (5 allocations: 176 bytes)

This is not what I would have predicted. I would have predicted the column-major view to be faster than the row-major view, and in both cases I wouldn’t have predicted such high memory allocation.

Can anyone shed some light on the above?
Thank you!

rdeits · July 19, 2019, 5:45pm

There are a couple of issues with your benchmarks that might be affecting the results. First of all, try using BenchmarkTools.jl to get more accurate timing results (and automatically avoid accidentally measuring compilation time instead of runtime). For example:

julia> using BenchmarkTools

julia> @btime sum_1($A)  # check out the BenchmarkTools.jl manual for why the $ is needed here
  130.168 ms (20000 allocations: 763.70 MiB)
5.0001936938218646e7

Also, your sum_3 function looks like it accidentally references the global variable A in axes(A, 2) rather than the local variable x.

Do those changes affect your results?

Nicolas_Gagne · July 19, 2019, 5:58pm

Thank you for your swift response. It indeed made a difference, I am now getting:
538.940 ms (20000 allocations: 763.70 MiB)
50.070 ms (10000 allocations: 468.75 KiB)
1.092 s (10000 allocations: 468.75 KiB)
111.176 ms (0 allocations: 0 bytes)
109.877 ms (0 allocations: 0 bytes)
48.378 ms (0 allocations: 0 bytes)

Which makes a lot more sense!
Do you understand why sum_2 is competitive with sum_6 in terms of time despite the many memory allocations?

rdeits · July 19, 2019, 6:17pm

I’m not sure, but I would guess that it’s because each column of your matrix is quite large. There is some cost to creating a view (unless the compiler can decide not to actually allocate it), but you are only paying that cost once per 10,000 elements. I bet that if you tried the same benchmark with a 10x10000 matrix you would see sum_6 perform better relative to sum_2.

Nicolas_Gagne · July 19, 2019, 6:59pm

You are right, with a 10x10000 matrix, sum_6 out performs sum_2. Thanks again!

Topic		Replies	Views
Reduce memory allocated in array view and in place sum Performance question	12	694	November 10, 2023
Performance deviations when summing an integer matrix Performance	3	353	October 23, 2021
Optimization of array views Performance	6	619	August 8, 2018
Looping over rows and columns: what am I doing wrong? Performance	4	1115	February 17, 2020
Avoiding allocations in `view`s General Usage	7	2221	July 4, 2019

Puzzled by the difference in performance when summing an array in different ways

Related topics