Difference between indexing a for loop

raminammour · June 16, 2017, 2:01pm

Hello all,

Can someone please explain to me why the following happens. I know that I am accessing the array in the wrong order, it is necessary to reproduce the allocation and slowdown. This came up in an actual application and I could not figure out why:



function f1(A)
    n1=size(A,1)
    n2=size(A,2)
    s=0.
    for i=1:n1,j=1:n2-1:n2
        s+=A[i,j]
    end
    return s
end

function f2(A)
    n1=size(A,1)
    n2=size(A,2)
    s=0.
    for i=1:n1, j in [1,n2]
        s+=A[i,j]
    end
    return s
end
n=10;A=rand(n,n)
f1(A);f2(A);
n=10000;A=rand(n,n)
@time s1=f1(A);
@time s2=f2(A);
s1-s2

  0.000608 seconds (5 allocations: 176 bytes)
  0.001324 seconds (20.00 k allocations: 1.068 MB)
  0.0

Thanks!

ChrisRackauckas · June 16, 2017, 2:30pm

One is an array, the other isn’t? You have the cost of creating an array in the second.

But I don’t even think that’s the issue.

The first time you call a function it will compile. What happens if you time it like:

@time s1=f1(A);
@time s1=f1(A);
@time s2=f2(A);
@time s2=f2(A);

?

mauro3 · June 16, 2017, 2:31pm

It should probably be:

function f2(A)
    n1=size(A,1)
    n2=size(A,2)
    s=0.
    for i=1:n1, j in 1:n2-1:n2 # <---
        s+=A[i,j]
    end
    return s
end

in and = are identical in their for-loop usage.

raminammour · June 16, 2017, 3:54pm

The functions were compiled before, I didn’t post it.

I think you are right about the array issue if you change the function to:

function f1(A)
    n1=size(A,1)
    n2=size(A,2)
    s=0.
    for i=1:n1,j=1:n2-1:n2
        s+=A[i,j]
    end
    return s
end

function f2(A)
    n1=size(A,1)
    n2=size(A,2)
    s=0.
    inds=[1,n2]
    for i=1:n1, j in inds
        s+=A[i,j]
    end
    return s
end
n=10;A=rand(n,n)
f1(A);f2(A);
n=10000;A=rand(n,n)
@time s1=f1(A);
@time s2=f2(A);
@time s1=f1(A);
@time s2=f2(A);
s1-s2

  0.000602 seconds (5 allocations: 176 bytes)
  0.000083 seconds (7 allocations: 288 bytes)
  0.000581 seconds (5 allocations: 176 bytes)
  0.000080 seconds (7 allocations: 288 bytes)

It becomes even faster! Which I don’t understand either.

I guess in the first version a 2 vector keeps getting created and garbage collected; I thought the compiler could figure out that it was a constant and pre-compute it but it didn’t.

Or I am just lost, which is a perfectly good explanation

Thanks for the hint.

ScottPJones · June 17, 2017, 4:49am

Another thing to note, arrays are column-major in Julia, so you’ll get better performance doing:

for j = 1:(n2-1):n2, i = 1:n1 ; s += A[i, j] ; end

ericsfraga · June 19, 2017, 6:55am

Hello all,

Can someone please explain to me why the following happens. I know
that I am accessing the array in the wrong order, it is necessary to
reproduce the allocation and slowdown. This came up in an actual
application and I could not figure out why:

Cannot answer your question as such (I’m definitely a n00b in Julia) but
do note that the two ranges for j are different?

raminammour · June 19, 2017, 1:36pm

The j ranges are the same. The issue was that the vector [1,n2] was being created and garbage collected inside the loop at every iteration, which is slow. With the step range 1:n2-1:n2 that does not happen.

anon61610682 · June 19, 2017, 2:04pm

I am not sure if you have intended to use the code for matrices of dimensions min(m,n) > 1, but in f1, there is a bug: when n2 == 1, you get an error. Plus, why would you iterate if you know ahead of time that you will use the first and last columns of the matrix?

Below code excerpt will save you one loop, and hence, is more efficient:

function f1(A)
    n1=size(A,1)
    n2=size(A,2)
    s=0.
    for i=1:n1,j=1:n2-1:n2
        s+=A[i,j]
    end
    return s
end

function f2(A)
    n1=size(A,1)
    n2=size(A,2)
    s=0.
    inds=[1,n2]
    for i=1:n1, j in inds
        s+=A[i,j]
    end
    return s
end

function f3(A::AbstractMatrix)
  n1, n2  = size(A)
  s       = zero(eltype(A))
  for row in 1:n1
    s += A[row, 1] + A[row, n2]
  end
  s
end

function f4{T<:AbstractFloat}(A::AbstractMatrix{T})
  n1, n2  = size(A)
  s       = zero(T)
  for row in 1:n1
    s += A[row, 1] + A[row, n2]
  end
  s
end

function f4{T<:Integer}(A::AbstractMatrix{T})
  n1, n2  = size(A)
  s       = zero(widen(T))
  for row in 1:nrows
    s += A[row, 1] + A[row, n2]
  end
  s
end

using BenchmarkTools

n=10000; A = rand(n, n);

@benchmark f1(A) # 181.749 us
@benchmark f2(A) # 24.275 us
@benchmark f3(A) # 11.238 us
@benchmark f4(A) # 11.240 us

If you still want to use iterations, you should put an assertion for n2 > 1. And in f3 above, for matrices having Integer elements, s might overflow. Overloading {T<:Integer}(A::AbstractMatrix{T}) with s = zero(widen(eltype(A))) would solve this problem. For AbstractFloats, you don’t have this problem (cf. f4 above).

Cheers!

raminammour · June 19, 2017, 2:40pm

Thanks for the reply.

This code is just meant to reproduce the behavior I was seeing in another application, n>>>1 and I am not all that interested in summing matrices .

This typically comes up when you iterate on an array, and then you want to do something different at the ends (think boundary conditions). Keeping the same code structure makes for a bit more elegant code. I know that it will always be faster if I rewrote the end cases as two explicit loops, but as dimensions grow, it gets tedious. The end conditions should be much less costly than the body of the loop, until something like my original f2() happens.

Cheers!

stevengj · June 19, 2017, 3:35pm

For boundary conditions at the ends, the most elegant, flexible, and efficient technique is often to use ghost elements. See my comment about this in another thread on discourse.

mkborregaard · June 19, 2017, 4:12pm

link Arrays with periodic boundaries - #4 by stevengj

pint · June 19, 2017, 4:19pm

as a tule of thumb, instead of a constant array, you should always try to use a constant tuple. here is my version, which is faster than the Range version according to my measurements:

function f3(A)
    n1=size(A,1)
    n2=size(A,2)
    s=0.0
    for i=1:n1,j=(1,n2)
        s+=A[i,j]
    end
    return s
end

Topic		Replies	Views
Performance difference between two code New to Julia	3	782	January 18, 2017
Performance difference depending on order of comparison in loop Performance performance , indexing	3	338	December 2, 2023
Strange behavior with array calls General Usage question , performance	8	493	January 2, 2019
3 different ways to create an array have different performance Performance	1	967	March 10, 2018
Puzzled by the difference in performance when summing an array in different ways Performance array , memory-allocation	4	876	July 19, 2019

Difference between indexing a for loop

Related topics