Maximum(x) uses memory differently for different sized array views

bingcrosby · June 18, 2017, 4:24pm

Hello all, it’s nice to be using and posting about Julia.

I need to use the maximum() function on SubArrays, and have observed some results which confuse me based on the shape of the original array. Essentially, I have a large m and small n, and would have thought that an m by n matrix would yield the better performance due to the column major ordering. However, that is not what I’m observing in terms of memory when using the maximum() function, as in the below example.

function test()
  x = rand(512,2) # Array is 512 by 2
  y = x' # Array is 2 by 512

  @time results1 = maximum(view(x,1:128,:))
  @time results2 = maximum(view(y,:,1:128))

  # Check the calculation was the same with the percentage error
  println(maximum(abs,100.0*(results1-results2)/results1))
end

test()
0.000003 seconds (4 allocations: 144 bytes)
0.000002 seconds (1 allocation: 48 bytes)
0.0

I have tried using 0.5.0 and 0.6.0-rc3 on two different Windows machines, using the Atom IDE.
Is there something that I’m overlooking?

dpsanders · June 18, 2017, 4:25pm

(Please quote your code using backticks.)

Check out the BenchmarkTooks.jl package for correct micro-benchmarking.

You need to run the code twice if you don’t use that package.

bingcrosby · June 18, 2017, 4:36pm

Thank you for the tip, I will investigate this package. In the meantime, I’ve added the a bit about my development environment and machine.

I have indeed run the code multiple times and the memory allocation benchmarks are invariant (after the first run affected by compilation).

dpsanders · June 18, 2017, 4:48pm

Using the following:

using BenchmarkTools

function test()
    x = rand(512,2) # Array is 512 by 2
    y = x' # Array is 2 by 512

    @btime results1 = maximum(view(x, 1:128, :))
    @btime results2 = maximum(view(y, :, 1:128))

    # Check the calculation was the same with the percentage error
    println(maximum(abs,100.0*(results1-results2)/results1))
end

I get

julia 0.6> test()
  5.572 μs (35 allocations: 896 bytes)
  5.578 μs (37 allocations: 944 bytes)
0.0

Note the use of @btime (with a b), from BenchmarkTools.jl, instead of @time.

juliohm · June 19, 2017, 3:18am

Welcome to Julia @bingcrosby!

Another tip to improve your future code, you can use isapprox(x,y) to check if x and y are approximately the same. You can also use the unicode version of the function by typing \approx and pressing TAB.

bingcrosby · June 19, 2017, 11:38am

Thanks for the example. However, this code throws,

julia 0.6> test()
UndefVarError: x not defined.

bingcrosby · July 24, 2017, 1:57pm

I’ve had more of a look at the problem and got Benchmark Tools working correctly. I’ve also implemented the trivial matrix max.

using BenchmarkTools

function my2DMatrixMax(x::SubArray{Float64,2})
    max = 0.0
    @inbounds for j = 1:size(x,2)
         for i = 1:size(x,1)
            if x[i,j] > max
                max = x[i,j]
            end
        end
    end
end

function test()
  x = rand(512,2) # Array is 512 by 2
  y = x' # Array is 2 by 512

  results1 = 0.0; results2 = 0.0; results3 = 0.0; results4 = 0.0

  # Based on FORTRAN ordering, the first first and third results should be quickest
  @btime $results1 = maximum(view($x,1:128,:))
  @btime $results2 = maximum(view($y,:,1:128))
  @btime $results3 = my2DMatrixMax(view($x,1:128,:))
  @btime $results4 = my2DMatrixMax(view($y,:,1:128))

  # Check the calculation was the same with the percentage error
  println(isapprox(results1,results2))
  println(isapprox(results2,results3))
  println(isapprox(results4,results4))
end

Running test() returns the following output:

julia> test()
  1.079 μs (4 allocations: 160 bytes)
  730.280 ns (1 allocation: 64 bytes)
  322.061 ns (1 allocation: 64 bytes)
  845.157 ns (1 allocation: 64 bytes)
true
true
true

Also, running Julia with --check-bounds=no does not change the numbers. My function performs as I would expect with regards to memory ordering, but the built in function does not. I am also surprised that it is slow in both cases (relative to the results3 example).

Could anybody elucidate the performance I’m seeing?

stevengj · July 24, 2017, 6:56pm

For the view(y,:,1:128), the Base implementation of mapreduce realizes that you are actually looking at a memory-
contiguous subset of y and it uses fast “linear indexing”. Specifically, this is because Base.IndexStyle(view(y,:,1:128)) returns IndexLinear().

For view(x,1:128,:), Base.IndexStyle returns IndexCartesian(): because the view is not memory-contiguous, it uses mapfoldl, which uses the much slower start/next/done iteration. It seems like this is something that could be sped up for AbstractArray by using explicit loops over the coordinates.

Topic		Replies	Views
Array views becoming dominant source of memory allocation Performance	31	5171	November 14, 2018
Puzzled by the difference in performance when summing an array in different ways Performance array , memory-allocation	4	876	July 19, 2019
Assign values to a subarray and it take a long time and large allocation Performance question , memory-allocation	4	388	September 6, 2021
`maximum!` allocates Performance question	4	610	August 28, 2018
Maximum!(v, A) is slower than own implementation? Performance	2	666	March 5, 2018

Maximum(x) uses memory differently for different sized array views

Related topics