I want to repeatedly multiply greensfunc by a random number, and add the result to X, but for each sum only consider the last n-1-elements of X (and first n-1 of greensfunc):
using TimerOutputs
function summat(nT)
@timeit to "Initlialising vector" T = LinRange(0,1,nT);
@timeit to "Initlialising vector" X = zeros(nT);
@timeit to "Greens function" greensfunc = exp.(-T/0.5) .* sin.(2*pi*T/0.5)
@timeit to "Integrating" begin
for n in 1:nT
X[n:end] = (X[n:end]) + (sqrt(1/float(nT))*randn()).*(greensfunc[1:(end-n+1)])
end
end
return X
end
to = TimerOutput();
@time sol = summat(10000);
show(to)
Then the allocations in the Integrating step decreases to 383MB. However this still seems a lot, and I think is the main slowdown for this code. I would like to make this much faster so I can increase nT by a factor 10^4, is there a better way?
Putting @. in front will also broadcast the rand() which is not the same.
So, @Ruvi_Lecamwasam, make sure to dot the assignment operator, otherwise there will be allocations on the right hand side, despite the views. And put @views in front of the whole expression.
Also, don’t do this
It doesn’t matter for performance, but it’s unidiomatic, just write 1/nT, this will produce a float, even though nT is an integer.
function summat(nT)
T = LinRange(0,1,nT);
X = zeros(nT);
greensfunc = exp.(-T/0.5) .* sin.(2*pi*T/0.5)
for n in 1:nT
X[n:end] = (X[n:end]) + (sqrt(1/float(nT))*randn()).*(greensfunc[1:(end-n+1)])
end
return X
end
function summat_(nT)
T = LinRange(0,1,nT);
X = zeros(nT);
greensfunc = exp.(-T ./ 0.5) .* sin.(2π .* T ./ 0.5)
for n in 1:nT
@views X[n:end] .= X[n:end] .+ (randn() / sqrt(nT)) .* greensfunc[1:(end-n+1)]
end
return X
end
Benchmarks:
jl> using BenchmarkTools
jl> @btime summat(10000);
209.422 ms (71812 allocations: 1.49 GiB)
jl> @btime summat_(10000);
7.363 ms (4 allocations: 156.41 KiB)
Although @DNF solution is perfect, I will just insist in the possibility of using @., because, personally, I find having to get all those points right very bug-prone. So, taking into consideration the fact that you do not want to broadcast the rand() call, you can use:
julia> function summat_(nT)
T = LinRange(0,1,nT);
X = zeros(nT);
greensfunc = exp.(-T ./ 0.5) .* sin.(2π .* T ./ 0.5)
for n in 1:nT
λ = rand()/sqrt(nT)
@views @. X[n:end] = X[n:end] + λ * greensfunc[1:(end-n+1)] # no more dots
end
return X
end
And, since we are there, why not use a loop and @tturbo (just for fun):
julia -t auto # 4 cores
julia> using LoopVectorization, BenchmarkTools
julia> @btime summat_(10000);
9.129 ms (4 allocations: 156.41 KiB)
julia> function summat_loop(nT)
Random.seed!(123)
T = LinRange(0,1,nT);
X = zeros(nT);
greensfunc = exp.(-T ./ 0.5) .* sin.(2π .* T ./ 0.5)
for n in 1:nT
λ = rand()/sqrt(nT)
@tturbo for i in 1:length(greensfunc)-n+1
X[n+i-1] = X[n+i-1] + λ * greensfunc[i]
end
end
return X
end
summat_loop (generic function with 1 method)
julia> @btime summat_loop(10000);
5.168 ms (8 allocations: 156.55 KiB)
edit: In this case you can just add @tturbo to the original broadcasted operation as well: