Hello all, it’s nice to be using and posting about Julia.

I need to use the maximum() function on SubArrays, and have observed some results which confuse me based on the shape of the original array. Essentially, I have a large m and small n, and would have thought that an m by n matrix would yield the better performance due to the column major ordering. However, that is not what I’m observing in terms of memory when using the maximum() function, as in the below example.

```
function test()
x = rand(512,2) # Array is 512 by 2
y = x' # Array is 2 by 512
@time results1 = maximum(view(x,1:128,:))
@time results2 = maximum(view(y,:,1:128))
# Check the calculation was the same with the percentage error
println(maximum(abs,100.0*(results1-results2)/results1))
end
```

test()

0.000003 seconds (4 allocations: 144 bytes)

0.000002 seconds (1 allocation: 48 bytes)

0.0

I have tried using 0.5.0 and 0.6.0-rc3 on two different Windows machines, using the Atom IDE.

Is there something that I’m overlooking?

(Please quote your code using backticks.)

Check out the BenchmarkTooks.jl package for correct micro-benchmarking.

You need to run the code twice if you don’t use that package.

Thank you for the tip, I will investigate this package. In the meantime, I’ve added the a bit about my development environment and machine.

I have indeed run the code multiple times and the memory allocation benchmarks are invariant (after the first run affected by compilation).

Using the following:

```
using BenchmarkTools
function test()
x = rand(512,2) # Array is 512 by 2
y = x' # Array is 2 by 512
@btime results1 = maximum(view(x, 1:128, :))
@btime results2 = maximum(view(y, :, 1:128))
# Check the calculation was the same with the percentage error
println(maximum(abs,100.0*(results1-results2)/results1))
end
```

I get

```
julia 0.6> test()
5.572 μs (35 allocations: 896 bytes)
5.578 μs (37 allocations: 944 bytes)
0.0
```

Note the use of `@btime`

(with a `b`

), from `BenchmarkTools.jl`

, instead of `@time`

.

1 Like

Welcome to Julia @bingcrosby!

Another tip to improve your future code, you can use `isapprox(x,y)`

to check if `x`

and `y`

are approximately the same. You can also use the unicode version of the function by typing `\approx`

and pressing `TAB`

.

3 Likes

Thanks for the example. However, this code throws,

```
julia 0.6> test()
UndefVarError: x not defined.
```

I’ve had more of a look at the problem and got Benchmark Tools working correctly. I’ve also implemented the trivial matrix max.

```
using BenchmarkTools
function my2DMatrixMax(x::SubArray{Float64,2})
max = 0.0
@inbounds for j = 1:size(x,2)
for i = 1:size(x,1)
if x[i,j] > max
max = x[i,j]
end
end
end
end
function test()
x = rand(512,2) # Array is 512 by 2
y = x' # Array is 2 by 512
results1 = 0.0; results2 = 0.0; results3 = 0.0; results4 = 0.0
# Based on FORTRAN ordering, the first first and third results should be quickest
@btime $results1 = maximum(view($x,1:128,:))
@btime $results2 = maximum(view($y,:,1:128))
@btime $results3 = my2DMatrixMax(view($x,1:128,:))
@btime $results4 = my2DMatrixMax(view($y,:,1:128))
# Check the calculation was the same with the percentage error
println(isapprox(results1,results2))
println(isapprox(results2,results3))
println(isapprox(results4,results4))
end
```

Running test() returns the following output:

```
julia> test()
1.079 μs (4 allocations: 160 bytes)
730.280 ns (1 allocation: 64 bytes)
322.061 ns (1 allocation: 64 bytes)
845.157 ns (1 allocation: 64 bytes)
true
true
true
```

Also, running Julia with --check-bounds=no does not change the numbers. My function performs as I would expect with regards to memory ordering, but the built in function does not. I am also surprised that it is slow in both cases (relative to the results3 example).

Could anybody elucidate the performance I’m seeing?

For the `view(y,:,1:128)`

, the Base implementation of `mapreduce`

realizes that you are actually looking at a memory-

contiguous subset of `y`

and it uses fast “linear indexing”. Specifically, this is because `Base.IndexStyle(view(y,:,1:128))`

returns `IndexLinear()`

.

For `view(x,1:128,:)`

, `Base.IndexStyle`

returns `IndexCartesian()`

: because the view is not memory-contiguous, it uses `mapfoldl`

, which uses the much slower `start`

/`next`

/`done`

iteration. It seems like this is something that could be sped up for `AbstractArray`

by using explicit loops over the coordinates.