Repeatedly summing displaced array, lots of allocations and slow

Ruvi_Lecamwasam · August 20, 2021, 6:44am

I have two one-dimensional arrays of length nT:

greensfunc is an array of Float64,
X is an array of Float64 zeros.

I want to repeatedly multiply greensfunc by a random number, and add the result to X, but for each sum only consider the last n-1-elements of X (and first n-1 of greensfunc):

using TimerOutputs

function summat(nT)
    
    @timeit to "Initlialising vector" T  = LinRange(0,1,nT);
    @timeit to "Initlialising vector" X = zeros(nT);

    @timeit to "Greens function" greensfunc = exp.(-T/0.5) .* sin.(2*pi*T/0.5)

    @timeit to "Integrating" begin
    for n in 1:nT
            X[n:end] =  (X[n:end]) + (sqrt(1/float(nT))*randn()).*(greensfunc[1:(end-n+1)])
    end
    end
    
    return X
    
end

to = TimerOutput();
@time sol = summat(10000);
show(to)

The loop however has lots of temporary allocations. I can make it a bit faster using some macros:

@inbounds X[n:end] = (@view X[n:end]) .+ (sqrt(1/float(nT))*randn()).*(@view greensfunc[1:(end-n+1)])

Then the allocations in the Integrating step decreases to 383MB. However this still seems a lot, and I think is the main slowdown for this code. I would like to make this much faster so I can increase nT by a factor 10^4, is there a better way?

jling · August 20, 2021, 7:22am

.=

maybe？ to allow fusion happen

lmiq · August 20, 2021, 9:45am

Or use:

Which is the same suggested, but easier perhaps.

DNF · August 20, 2021, 11:29am

Putting @. in front will also broadcast the rand() which is not the same.

So, @Ruvi_Lecamwasam, make sure to dot the assignment operator, otherwise there will be allocations on the right hand side, despite the views. And put @views in front of the whole expression.

Also, don’t do this

It doesn’t matter for performance, but it’s unidiomatic, just write 1/nT, this will produce a float, even though nT is an integer.

DNF · August 20, 2021, 12:00pm

Here’s the effect:

function summat(nT)
    T  = LinRange(0,1,nT);
    X = zeros(nT);
    greensfunc = exp.(-T/0.5) .* sin.(2*pi*T/0.5)
    for n in 1:nT
            X[n:end] =  (X[n:end]) + (sqrt(1/float(nT))*randn()).*(greensfunc[1:(end-n+1)])
    end
    return X
end

function summat_(nT)
    T  = LinRange(0,1,nT);
    X = zeros(nT);
    greensfunc = exp.(-T ./ 0.5) .* sin.(2π .* T ./ 0.5)
    for n in 1:nT
        @views X[n:end] .= X[n:end] .+ (randn() / sqrt(nT)) .* greensfunc[1:(end-n+1)]
    end
    return X
end

Benchmarks:

jl> using BenchmarkTools

jl> @btime summat(10000);
  209.422 ms (71812 allocations: 1.49 GiB)

jl> @btime summat_(10000);
  7.363 ms (4 allocations: 156.41 KiB)

lmiq · August 20, 2021, 8:22pm

Although @DNF solution is perfect, I will just insist in the possibility of using @., because, personally, I find having to get all those points right very bug-prone. So, taking into consideration the fact that you do not want to broadcast the rand() call, you can use:

julia> function summat_(nT)
           T  = LinRange(0,1,nT);
           X = zeros(nT);
           greensfunc = exp.(-T ./ 0.5) .* sin.(2π .* T ./ 0.5)
           for n in 1:nT
               λ = rand()/sqrt(nT)
               @views @. X[n:end] = X[n:end] + λ * greensfunc[1:(end-n+1)] # no more dots
           end
           return X
       end

And, since we are there, why not use a loop and @tturbo (just for fun):

julia -t auto # 4 cores

julia> using LoopVectorization, BenchmarkTools

julia> @btime summat_(10000);
  9.129 ms (4 allocations: 156.41 KiB)

julia> function summat_loop(nT)
           Random.seed!(123)
           T  = LinRange(0,1,nT);
           X = zeros(nT);
           greensfunc = exp.(-T ./ 0.5) .* sin.(2π .* T ./ 0.5)
           for n in 1:nT
               λ = rand()/sqrt(nT)
               @tturbo for i in 1:length(greensfunc)-n+1
                 X[n+i-1] = X[n+i-1] + λ * greensfunc[i]
               end
           end
           return X
       end
summat_loop (generic function with 1 method)

julia> @btime summat_loop(10000);
  5.168 ms (8 allocations: 156.55 KiB)

edit: In this case you can just add @tturbo to the original broadcasted operation as well:

@views @tturbo X[n:end] .= X[n:end] .+ (randn() / sqrt(nT)) .* greensfunc[1:(end-n+1)]

and it works just as fine.

Topic		Replies	Views
Common allocation mistakes Performance memory-allocation	47	7174	August 21, 2023
Reduce memory allocated in array view and in place sum Performance question	12	694	November 10, 2023
Reduce the allocation to zero General Usage question	7	458	April 7, 2022
Huge memory allocation New to Julia array , memory-allocation	17	1114	January 25, 2024
Unclear allocation behaviour with built-in sum() Performance question , memory-allocation	4	204	February 2, 2025

Repeatedly summing displaced array, lots of allocations and slow

Related topics