# Reducing allocations and run time

Hi,
I’m running a program and trying to speed it up. I don’t get the logic of views, or any other way to reduce the number of allocations. A simplified example is

``````aa = [4 3; 4 5]
bb = [1 2;6 3]
cc = zeros(2,2,2)
@time @views for n=1:2 for m=1:2
cc[:,n,m] = aa[n,:].*bb[:,m]
end end
``````

which gives

``````0.000057 seconds (36 allocations: 1.812 KiB)
``````

in a bigger example the number of allocations is much larger. How can views bring allocations to 0?

Thanks

You need to do the assignment in-place with .= because otherwise the block of memory resulting from the multiplication is allocated before it’s written to the specified range in your result array.

2 Likes

Not sure, if I use

``````cc[:,n,m] .=
``````

I get

``````0.000059 seconds (40 allocations: 1.812 KiB)
``````

Ok I assumed you were not running this at global scope, but it seems you are? Put your code in a function so that the types of all variables are known and constant.

This is one of the most common reasons for superfluous allocations and bad performance, if you reference variables that are just defined at the top level. You could always reassign their values if they are not marked const, so the compiled code will assume they can have any type. Putting stuff into functions, marking variables const or using let blocks are ways to mitigate this

3 Likes

An additional speed up by using also `@inbounds`

I had created a bad example, outside a function. This is a similar example where I put the code into a function

``````Tk=10
Tp=16
S=104
N=336
par1 = 2
mat1 = rand(Tk,S,4)
mat2 = rand(Tp,N,4)
mat3 = rand(S,N,4)
include("Untitled.jl")

function myfun(Tk,Tp,S,N,i,mat1,mat2,mat3,par1)
reset_timer!()

mat4 = mat3.^par1
y = zeros(Tk,Tp,S,N)

@timeit "mytime" for n=1:N for s in 1:S#@timeit "create indV"
y[:,:,s,n] = mat1[:,s,i]*mat2[:,n,i]'/mat4[s,n,i]

end end
print_timer()
return y
end

myfun(Tk,Tp,S,N,1,mat1,mat2,mat3,par1)
``````

The timer still gives

``````                          Time                   Allocations
──────────────────────   ───────────────────────
Tot / % measured:     40.4ms / 86.4%            148MiB / 70.5%

Section   ncalls     time   %tot     avg     alloc   %tot      avg
──────────────────────────────────────────────────────────────────
mytime         1   34.9ms   100%  34.9ms    105MiB  100%    105MiB
``````

with and without .=

I guess a more general question is how to generate 0 allocations (or close to) in this example.

Now you’re not using views it seems. Also your example seems to differ a bit in that you’re not doing broadcasted multiplication anymore but some sort of matrix multiplication, which will allocate unless you use preexisting memory for the result. I think that’s what mul! does but I’m not much of a linalg person. Basically think about whether your expressions create intermediary arrays. If yes, try to write those results into already existing memory with mutating functions or the previously mentioned broadcasted assignment. In your example you’re creating copies with your indexing syntax, then multiplying and dividing and therefore creating a bunch of intermediary arrays.

1 Like

1. provide an actual minimal working example. Your code does not run (clearly, because of the “Untitled.jl” file).

2. Put these two lines outside the function and pass the matrices as parameters as well in the tests. Thes lines obviously allocate, and it will be better to have them outside the tests:

1. I am not sure what `@timeit` actually does (have not seen it here before). The most usual package for benchmarking is `BenchmarkTools`, and you could use `@btime` on the call to `myfun` with the new `y` and `mat4` parameters, to work on those allocations.
1 Like

It’s not identical, but a MWE with 3 options:

``````using BenchmarkTools, LinearAlgebra, Random

function myfun!(y, mat1, mat2, iCase)
Tk, S = size(mat1);
Tp, N = size(mat2);

if iCase == 1
yPart = zeros(Tk, Tp);
end
@inbounds @views for n=1:N
for s in 1:S
if iCase == 1
mul!(yPart, mat1[:,s], mat2[:,n]');
y[:,:,s,n] .= yPart;
elseif iCase == 2
y[:,:,s,n] .= mat1[:,s]*mat2[:,n]';
elseif iCase == 3
for i1 = 1 : Tk
for i2 = 1 : Tp
y[i1,i2,s,n] = mat1[i1,s]*mat2[i2,n];
end
end
end
end
end
return nothing
end

function bmark(iCase)
rng = MersenneTwister(123)
Tk=10
Tp=16
S=104
N=336
mat1 = rand(rng, Tk,S)
mat2 = rand(rng, Tp,N)
y = zeros(Tk,Tp,S,N)
b = @benchmarkable myfun!(\$y, \$mat1, \$mat2, \$iCase);
run(b)
end
``````

Unrolling the matrix multiplication avoids allocations and wins in terms of speed. But `mul!` helps some.

``````julia> bmark(1)
BenchmarkTools.Trial:
memory estimate:  1.33 KiB
allocs estimate:  1
--------------
minimum time:     8.877 ms (0.00% GC)
median time:      9.118 ms (0.00% GC)
mean time:        9.320 ms (0.00% GC)
maximum time:     13.458 ms (0.00% GC)
--------------
samples:          537
evals/sample:     1

julia> bmark(2)
BenchmarkTools.Trial:
memory estimate:  45.32 MiB
allocs estimate:  34945
--------------
minimum time:     11.003 ms (0.00% GC)
median time:      12.034 ms (0.00% GC)
mean time:        18.336 ms (32.68% GC)
maximum time:     76.342 ms (84.29% GC)
--------------
samples:          273
evals/sample:     1

julia> bmark(3)
BenchmarkTools.Trial:
memory estimate:  0 bytes
allocs estimate:  0
--------------
minimum time:     3.754 ms (0.00% GC)
median time:      4.003 ms (0.00% GC)
mean time:        4.180 ms (0.00% GC)
maximum time:     11.018 ms (0.00% GC)
--------------
samples:          1196
evals/sample:     1
``````

I’m sure there must be a function that implements `A B'` without allocating, but I don’t know what it is.

1 Like