# Repeatedly summing displaced array, lots of allocations and slow

I have two one-dimensional arrays of length nT:

• `greensfunc` is an array of Float64,
• `X` is an array of Float64 zeros.

I want to repeatedly multiply `greensfunc` by a random number, and add the result to X, but for each sum only consider the last `n-1`-elements of `X` (and first `n-1` of `greensfunc`):

``````using TimerOutputs

function summat(nT)

@timeit to "Initlialising vector" T  = LinRange(0,1,nT);
@timeit to "Initlialising vector" X = zeros(nT);

@timeit to "Greens function" greensfunc = exp.(-T/0.5) .* sin.(2*pi*T/0.5)

@timeit to "Integrating" begin
for n in 1:nT
X[n:end] =  (X[n:end]) + (sqrt(1/float(nT))*randn()).*(greensfunc[1:(end-n+1)])
end
end

return X

end

to = TimerOutput();
@time sol = summat(10000);
show(to)
``````

The loop however has lots of temporary allocations. I can make it a bit faster using some macros:

`@inbounds X[n:end] = (@view X[n:end]) .+ (sqrt(1/float(nT))*randn()).*(@view greensfunc[1:(end-n+1)]) `

Then the allocations in the `Integrating` step decreases to 383MB. However this still seems a lot, and I think is the main slowdown for this code. I would like to make this much faster so I can increase `nT` by a factor 10^4, is there a better way?

.=

maybe？ to allow fusion happen

2 Likes

Or use:

Which is the same suggested, but easier perhaps.

1 Like

Putting `@.` in front will also broadcast the `rand()` which is not the same.

So, @Ruvi_Lecamwasam, make sure to dot the assignment operator, otherwise there will be allocations on the right hand side, despite the `view`s. And put `@views` in front of the whole expression.

Also, don’t do this

It doesn’t matter for performance, but it’s unidiomatic, just write `1/nT`, this will produce a float, even though `nT` is an integer.

3 Likes

Here’s the effect:

``````function summat(nT)
T  = LinRange(0,1,nT);
X = zeros(nT);
greensfunc = exp.(-T/0.5) .* sin.(2*pi*T/0.5)
for n in 1:nT
X[n:end] =  (X[n:end]) + (sqrt(1/float(nT))*randn()).*(greensfunc[1:(end-n+1)])
end
return X
end

function summat_(nT)
T  = LinRange(0,1,nT);
X = zeros(nT);
greensfunc = exp.(-T ./ 0.5) .* sin.(2π .* T ./ 0.5)
for n in 1:nT
@views X[n:end] .= X[n:end] .+ (randn() / sqrt(nT)) .* greensfunc[1:(end-n+1)]
end
return X
end
``````

Benchmarks:

``````jl> using BenchmarkTools

jl> @btime summat(10000);
209.422 ms (71812 allocations: 1.49 GiB)

jl> @btime summat_(10000);
7.363 ms (4 allocations: 156.41 KiB)
``````
5 Likes

Although @DNF solution is perfect, I will just insist in the possibility of using `@.`, because, personally, I find having to get all those points right very bug-prone. So, taking into consideration the fact that you do not want to broadcast the `rand()` call, you can use:

``````julia> function summat_(nT)
T  = LinRange(0,1,nT);
X = zeros(nT);
greensfunc = exp.(-T ./ 0.5) .* sin.(2π .* T ./ 0.5)
for n in 1:nT
λ = rand()/sqrt(nT)
@views @. X[n:end] = X[n:end] + λ * greensfunc[1:(end-n+1)] # no more dots
end
return X
end
``````

And, since we are there, why not use a loop and `@tturbo` (just for fun):

`julia -t auto` # 4 cores

``````julia> using LoopVectorization, BenchmarkTools

julia> @btime summat_(10000);
9.129 ms (4 allocations: 156.41 KiB)

julia> function summat_loop(nT)
Random.seed!(123)
T  = LinRange(0,1,nT);
X = zeros(nT);
greensfunc = exp.(-T ./ 0.5) .* sin.(2π .* T ./ 0.5)
for n in 1:nT
λ = rand()/sqrt(nT)
@tturbo for i in 1:length(greensfunc)-n+1
X[n+i-1] = X[n+i-1] + λ * greensfunc[i]
end
end
return X
end
summat_loop (generic function with 1 method)

julia> @btime summat_loop(10000);
5.168 ms (8 allocations: 156.55 KiB)

``````

edit: In this case you can just add `@tturbo` to the original broadcasted operation as well:

``````@views @tturbo X[n:end] .= X[n:end] .+ (randn() / sqrt(nT)) .* greensfunc[1:(end-n+1)]
``````

and it works just as fine.

2 Likes