Performance of OffsetArrays


#1

Hello,

I tried to use OffsetArrays.jl in my own (finite difference) codes and experienced a sizable slowdown.Here is a code that resembles my use cases.

function lap2(x,y)
    n1,n2=size(x)
    for i2=2:n2-1
      for i1=2:n1-1
            @inbounds y[i1,i2]=(x[i1+1,i2]-2x[i1,i2]+x[i1-1,i2])*x[i1,i2]*x[i1+1,i2+1]+
                (x[i1,i2-1]-2x[i1,i2]+x[i1,i2-1])*x[i1-1,i2]*x[i1,i2+1]
        end
    end
end
function lap2o(x,y)
    I1=indices(x,1)[2:end-1]
    I2=indices(x,2)[2:end-1]
    @unsafe begin
    for i2 in I2
          for i1 in I1
           (y[i1,i2]=(x[i1+1,i2]-2x[i1,i2]+x[i1-1,i2])*x[i1,i2]*x[i1+1,i2+1]+
                (x[i1,i2-1]-2x[i1,i2]+x[i1,i2-1])*x[i1-1,i2]*x[i1,i2+1])
        end
    end
    end
end

using OffsetArrays,BenchmarkTools
x=rand(1001,2001);y=zeros(size(x));
ox=OffsetArray(copy(x),-100:900,-100:1900);oy=OffsetArray(copy(y),-100:900,-100:1900);
#warm up...
julia> @benchmark lap2o(ox,oy)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     4.339 ms (0.00% GC)
  median time:      4.363 ms (0.00% GC)
  mean time:        4.375 ms (0.00% GC)
  maximum time:     6.494 ms (0.00% GC)
  --------------
  samples:          1140
  evals/sample:     1

julia> @benchmark lap2(x,y)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     3.546 ms (0.00% GC)
  median time:      3.581 ms (0.00% GC)
  mean time:        3.594 ms (0.00% GC)
  maximum time:     4.586 ms (0.00% GC)
  --------------
  samples:          1387
  evals/sample:     1

The slowdown here is proportional to the one I am seeing for my codes. If I profile the two calls I see many calls to unsafe_setindex and unsafe_getindex in OffsetArrays but not for the normal Arrays.

Is this slowdown expected/normal? Is is system specific or am I just missing something?

OffsetArrays would clean up my codes and make them more readable if I can solve this slowdown issue.

Cheers!

julia> versioninfo()
Julia Version 0.6.0
Commit 9036443 (2017-06-19 13:05 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, sandybridge)

#2

Thanks for the excellent example. I see a hit too, though it’s smaller (about 10%). There are possibly two issues, although I’m guessing the first one is not actually relevant:

  • if the @unsafe macro doesn’t successfully inline things, that could be a problem. I doubt that’s happening. But in either case when I get some time I plan to “take off the training wheels” on OffsetArrays and allow @inbounds to do what it does for other array types.
  • fundamentally, LLVM should recognize that the offsets here can be compensated by adjusting the ranges at the beginning. One hunch is that this is failing due to our UnitRange constructor being not quite as straightforward as you might expect. This did cause a problem for @polly, if I remember correctly.

But I haven’t had time (and don’t have time) to dig into this right now, sorry about that. Might be worth filing a Julia issue just so this doesn’t get lost.


#3

Another theory is that LLVM can’t prove that ox and oy have the same offsets. What if you impose a check that the indices match (and have it throw an error if not) prior to the loop? That might theoretically let LLVM assume they are the same, and in any event would be much better practice anyway (since currently the code is vulnerable to a segfault).


#4

Thanks for the reply!

Checking for indices does not help. The performance hit persists even if the offset arrays are not offset at all ox=OffsetArray(copy(x),1:1001,1:2001).

If, however, the scheme only accesses the arrays at [i1,i2] (no i +- 1 terms) then you don’t see any performance difference.


#5

This is an old thread, but I’m pleased to report new information. On an up-to-the moment copy of Julia 0.7 (you need https://github.com/JuliaLang/julia/pull/24899 and a few others, also the new optimizer probably helps) and https://github.com/JuliaArrays/OffsetArrays.jl/pull/52, the performance gap (on my machine) is no longer present. Indeed, for reasons I don’t fully understand the OffsetArray version seems consistently a bit faster:

julia> @btime lap2o($ox,$oy)
  2.365 ms (0 allocations: 0 bytes)                                                                                                                                                          
                                                                                                                                                                                             
julia> @btime lap2($x,$y)
  2.588 ms (0 allocations: 0 bytes)                                                                                                                                                          
                                                                                                                                                                                             
julia> @btime lap2o($ox,$oy)
  2.372 ms (0 allocations: 0 bytes)                                                                                                                                                          
                                                                                                                                                                                             
julia> @btime lap2($x,$y)
  2.593 ms (0 allocations: 0 bytes)

#6

I can confirm and even see that the OffsetArray version is faster on Julia Version 0.7.0-alpha.0 and [6fe1bfb0] OffsetArrays v0.7.0. Now to rewriting codes in readable form…

Thanks for following up on this!


#7

Worth pointing out that using https://github.com/JuliaArrays/OffsetArrays.jl/pull/52 on anything earlier than julia 0.7.0-beta.270 is dangerous; you could easily get segfaults.


#8

Hi Tim,

Just wondering, since yesterday’s commit to OffsetArrays, while testing PtFEM.jl#master, I get an error on dot(a, b) where both a and b are zero based OffsetArrays:

PtFEM.jl: Error During Test at /Users/rob/.julia/dev/PtFEM/test/runtests.jl:39
Got exception LoadError("/Users/rob/.julia/dev/PtFEM/test/test_p56.1.jl", 33, ErrorException(“size not supported for arrays with axes (Base.Slice(0:124),); see http://docs.julialang.org/en/latest/devdocs/offset-arrays/”)) outside of a @test
LoadError: size not supported for arrays with axes (Base.Slice(0:124),); see http://docs.julialang.org/en/latest/devdocs/offset-arrays/

Further down it points to line 228, which is the dot(…) statement. But I’m not sure that line number is correct, although usually it is.

I also still get a boat load of

┌ Warning: _length is deprecated, use length instead.
│ caller = indexlength at OffsetArrays.jl:166 [inlined]
└ @ Core ~/.julia/packages/OffsetArrays/jFWq/src/OffsetArrays.jl:166

and you suggest to insert

_size(A::AbstractArray) = size(LinearIndices(A))
_size(A) = size(A)

_length(A::AbstractArray) = length(LinearIndices(A))
_length(A) = length(A)

I’ve done that in the PtFEM.jl module definition file, but that is not fixing it, so should I fork OffSetArrays?

Thanks a lot for your work on OffsetArrays, it is great to have this package!

Rob


#9

It seems like you’re probably not using https://github.com/JuliaArrays/OffsetArrays.jl/pull/52 (note it hasn’t been merged yet), since those error messages are generated by an earlier version of the package. Also you have to use a version of Julia less than 24 hours old.

For that OffsetArrays PR, I’m just giving people a chance to complain before merging it. Should happen sometime today.


#10

Thanks Tim, master now works fine!