Pre-allocating Arrays for intermediate calculations

krenova · March 11, 2019, 7:21am

To save on computation time for a function i often use, i created a preallocated array that i pass to the function to store intermediate calculations. As an example, albeit a contrived one, i have this function test() that I often need to use:

# Let N be some number representing some integer representing the array size
_b = Array{Float64,1}(undef,N)
function test( a, b = _b)
     for i in 1:100 
          b[i] += a + i
     end
     return( b .* 100 )
end

In the example, i created a preallocated array and use it to store intermediate results in the hope to save on computation time. However, I wonder if this makes sense, if it could cause some unforeseen bugs later on and if there are better ways to go about reducing computation time.

Please advise.

‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’‘’’
Please note that i meant b[i] = a + i. Which is very different from b[i] += a + i. Sorry for the confusion. The responses below still hold. thanks for pointing out @DNF.

kristoffer.carlsson · March 11, 2019, 7:26am

Rather than using global variables, it is often better to let the caller preallocate the workspace needed. The caller can then hoist that preallocation outside of the loop.

krenova · March 11, 2019, 7:37am

Hi @kristoffer.carlsson, thanks for your prompt response. Do you mean that I do not have to worry about preallocation within functions as the function will automatically do a preallocation? In the context of the given example, it means the object b, once created, will always exists till the programme stops?

Elrod · March 11, 2019, 8:04am

No, more like


function test!( b, a)
    @inbounds @simd for i in 1:100 
        b[i] += a + i
    end
    # Why not reuse "b" again?
    b .*= 100
end

function test_a_lot(N)
    _b = Array{Float64,1}(undef, 100)
    result = 0.0
    for n ∈ 1:N
        result += (-0.01)^n * sum(test!(_b, 0.5n))
    end
    result
end

That is, whatever function you’re using that is using test many times should manage preallocating the buffer.
If you’re only using test once, there isn’t going to be any benefit.

If you’re using it many times from many different places, maybe you can pass around a program state object that stores all the state and buffer variables, so you only need to manage a single object.

DNF · March 11, 2019, 8:19am

It’s a bit odd to go to the trouble of pre-allocating, only to throw away the benefit and make a new vector in the return statement…

You should definitely be careful with globals.

_b will be mutated each time you call test, so the returned result will be different each time, even with the same input. Is that intentional? And using undef and then incrementing, means that the output may contain garbage. Is that intentional too?

PS. Minor point: return is a keyword not a function, so it’s idiomatic to not use parens: return b .* 100

tkluck · March 11, 2019, 7:47pm

Maybe capturing into a closure is the pattern you are looking for. Here’s what that could look like (with a similarly contrived example):

julia> f = let b=Vector{Int}(undef, 1000)
           function f(x)
              b .= (1:1000) .+ x
              return sum(b)
           end
       end
f (generic function with 1 method)

julia> f(3)
503500

I think this expresses what you want to do in the nicest way. Depending on your use case, it does have a performance downside: Performance Tips · The Julia Language . Do benchmark your code to check.

Tamas_Papp · March 12, 2019, 7:29am

I would argue that in most cases, defining a struct that holds the buffer and possibly making it callable is a better solution — it provides better introspection and maintainability, and of course you are less likely to run into the above bug.

krenova · March 12, 2019, 7:45am

thanks @tkluck, this does indeed reflect what I hope to do. But as you cautioned, it could result in performance issues. Will keep that in mind.

krenova · March 12, 2019, 7:49am

thanks @Tamas_Papp, do you mean that I should define a struct that holds these interim objects that i can call when needed? Hope its not a stupid question, I code predominantly in R and dont usually find myself having to deal with these different types of programming tools.

krenova · March 12, 2019, 7:52am

thanks for pointing out @DNF, it should have been b[i] = a + i in the example.

Also thanks for pointing out that parentheses are not necessary for the keyword return

Tamas_Papp · March 12, 2019, 8:07am

maybe an example helps:

struct BufferedContrivedExample{T}
    buffer::T
end

BufferedContrivedExample(T, N::Integer) = BufferedContrivedExample(Vector{T}(undef, N))

function (bce::BufferedContrivedExample)(a)
    b = bce.buffer
    for i in axes(b, 1)
        b[i] += a + i
    end
    b .* 100
end

bce = BufferedContrivedExample(Float64, 100);

bce(1.0)

Topic		Replies	Views
Preallocation and data race in parallelization General Usage	9	304	March 1, 2024
Pre-allocating array efficiency Performance	5	236	October 12, 2024
Pre-allocating outputs, inplace functions and performance General Usage	13	1303	February 7, 2020
Allocation help - preforming computations over loops Performance	7	391	July 21, 2022
Memory Pre-allocation in the global scope New to Julia question	20	1160	August 18, 2022

Pre-allocating Arrays for intermediate calculations

Related topics