Optimization of array views


#1

Consider the following code

A = rand(1000,20);
res = zeros(size(A,1));

@time for i in 1:size(A,1)
	p = view(A, i, :)
	res[i] = sum(p)
end
0.004405 seconds (21.94 k allocations: 436.531 KiB)

Note the number of allocations and memory usage From what I understand the view costs some memory to be allocated and is being allocated in every iteration. But if view is an immutable struct then why is it not being optimized / reused in every iteration?

PS: I am aware of taking sum along a dimension, using sum here only as an example.


#2

Did you try putting them into a function?


#3
function h()
	A = rand(1000,20);
	res = zeros(size(A,1));

	for i in 1:size(A,1)
		p = view(A, i, :)
		res[i] = sum(p)
	end
	res
end

@time h();
0.000078 seconds (1.01 k allocations: 211.297 KiB)

@time sum(A,2);
  0.000052 seconds (10 allocations: 8.188 KiB)

The results above are after warmup of code. Its faster but 1k allocations remain.


#4

One issue’s that your function h() includes the time and allocations needed to generate the random array. Rewriting to accept A as an argument cuts the time in half. Views are still expensive, though: you’re allocating at each iteration of your inner loop for an otherwise-cheap summation.

using BenchmarkTools
A = rand(1000, 20)

@btime h1()       # 53.892 μs (1003 allocations: 211.14 KiB)
@btime h2($A)     # 25.247 μs (1001 allocations: 54.81 KiB)
@btime sum($A, 2) # 4.455 μs (6 allocations: 8.03 KiB)

#5

Sorry for the incorrect example shared above. I was actually trying to understand why are view objects being allocated in each loop. They are defined as immutable structs with fixed size. So shouldn’t the compiler be able to reuse the same space allocated in the previous iteration with new parameters?



using BenchmarkTools
function m()
	A = rand(1000000,20);
	res = zeros(size(A,1));

	for i in 1:size(A,1)
		for j in 1:size(A,2)
			res[i] += A[i,j]
		end
	end
	res
end

@btime m();
78.411 ms (4 allocations: 160.22 MiB)

function h()
	A = rand(1000000,20);
	res = zeros(size(A,1));

	for i in 1:size(A,1)
		p = view(A, i, :)
		res[i] = sum(p)
	end
	res
end

@btime h();
122.569 ms (1000004 allocations: 205.99 MiB)

function p()
	A = rand(1000000,20)
	res = sum(A,2)
end

@btime p();
66.007 ms (9 allocations: 160.22 MiB)

#6

rand(1000000, 20) dominates the runtime of all your functions, so I’d highly recommend passing A as an function argument if you’re trying to isolate the performance of another part of the code.

On allocations, see here. Per Stefan, “Being able to stack-allocate objects that refer to the heap is an important case that we need to address, but doing so is non-trivial and hasn’t been done yet.”

See also #14955.


#7

Try


julia> @inline function isum(x)
           out = zero(eltype(x))
           @simd for xᵢ ∈ x
               out += xᵢ
           end
           out
       end
isum (generic function with 1 method)

julia> using BenchmarkTools

julia> @benchmark h2()
BenchmarkTools.Trial: 
  memory estimate:  164.27 KiB
  allocs estimate:  3
  --------------
  minimum time:     23.534 μs (0.00% GC)
  median time:      26.699 μs (0.00% GC)
  mean time:        37.349 μs (24.13% GC)
  maximum time:     48.556 ms (99.83% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> @benchmark h()
BenchmarkTools.Trial: 
  memory estimate:  211.14 KiB
  allocs estimate:  1003
  --------------
  minimum time:     31.939 μs (0.00% GC)
  median time:      36.909 μs (0.00% GC)
  mean time:        52.521 μs (27.73% GC)
  maximum time:     42.334 ms (99.78% GC)
  --------------
  samples:          10000
  evals/sample:     1

where h2 is simply h, using isum instead of sum.
If the function call gets inlined, Julia can elide the allocations. This was on 0.7.

If you can’t easily inline the functions you’re calling, try