Memory Pre-allocation in the global scope

I have written a code with no memory allocation. To do so, I have defined all required pieces of the memory (in the global scope) beforehand and passed them into mutating functions wherever needed.
I was wondering whether it is conceptually correct or not. I mean, is it the normal way of avoiding memory allocations within function calls?

This is a small example of what I have implemented:

using BenchmarkTools
n=10
N=100
R1=zeros(Float64,n)
R2=zeros(Float64,n)
V=zeros(Float64,n,N);
P=zeros(Float64,n,N);
G=zeros(Float64,n);
function F1!(V::Matrix{Float64}, n::Int64, i::Int64, R1::Vector{Float64}, R2::Vector{Float64}, P::Matrix{Float64}, G::Vector{Float64})
  for j=1:n
    R1[j] = rand()
    R2[j] = rand()
  end
  @. V[:,i] =  R1*(@view P[:,i]) + R2 * G
end

i=10
@btime  F1!(V::Matrix{Float64}, n::Int64, i::Int64, R1::Vector{Float64}, R2::Vector{Float64}, P::Matrix{Float64}, G::Vector{Float64})


Using global variables is poor practice for performance reasons. You are much better off to wrap all of that in a function called “calc” or “go” or something and have it return the results. You can still preallocate within that function.

See here Performance Tips · The Julia Language

1 Like

So, do you mean something like this?

function F1!(V::Matrix{Float64}, n::Int64, i::Int64, R1::Vector{Float64}, R2::Vector{Float64}, P::Matrix{Float64}, G::Vector{Float64})
  for j=1:n
    R1[j] = rand()
    R2[j] = rand()
  end
  @. V[:,i] =  R1*(@view P[:,i]) + R2 * G
end

function f2()
  n=10
  N=100
  R1=zeros(Float64,n)
  R2=zeros(Float64,n)
  V=zeros(Float64,n,N);
  P=zeros(Float64,n,N);
  G=zeros(Float64,n);

  i=10
  F1!(V::Matrix{Float64}, n::Int64, i::Int64, R1::Vector{Float64}, R2::Vector{Float64}, P::Matrix{Float64}, G::Vector{Float64})

end

@btime  f2()

Maybe I’m missing something? Here is the results of the two runs:

The first implementation:

i=10
@btime  F1!(V::Matrix{Float64}, n::Int64, i::Int64, R1::Vector{Float64}, R2::Vector{Float64}, P::Matrix{Float64}, G::Vector{Float64})
73.717 ns (0 allocations: 0 bytes)

The second implementation:

@btime  f2()
1.030 ÎĽs (5 allocations: 16.30 KiB)

Yes, in the first case you are timing only F1! Because you already allocated the stuff before doing @btime

Another thing, in Julia we usually don’t adorn things with explicit type declarations. This may be mitigating the usual issues with using global variables since the issue is globals are not type stable.

1 Like

Oh, true!
Thanks a lot.
As the last question, do you know a reference other than Julia’s documentation where I can dig more into the efficiency tips?

Julia’s documentation (Performance Tips, mainly) should really be your starting point, and I don’t think anything else is even necessary from the “theoretical” side. But from the practical side, there are some packages being developed, like JET.jl, which help with automation.

Regarding your initial question, there are several packages which help with preallocation, like these two:

1 Like

Both strategies are fine, depending on what you want. Your F1! function is fine, and you have correctly used the approach of passing all variables as parameters of that function (this is fundamental, you do not use global variables inside the function). Then, you are benchmarking the performance of the F1!. This is the usual workflow when your function does “the job” and you will use it with different data in an interative environment.

Defining the f2 as above helps in particular if you are developing that code using Revise, and changing the content of f2, then you can just update it and rerun it to obtain new results.

In my application, I need to run F1!, along with many other functions, many times. So, I reserved some pieces of memory and overwrote them in each iteration. Please correct me if I’m wrong. In the first approach, the variables are defined in the global scope, but when I pass them into a function, overwriting them doesn’t cause performance issues. Is that correct?

You could just do

@. V[:,i] =  rand()*(@view P[:,i]) + rand() * G

and eliminate the R1 and R2 arrays completely. The @. macro will turn rand() into rand.(), which will call it once per element.

(Also note that there is typically no need to pass array sizes like n into a function, because you get get them from size or length.)

Also, note that your type signatures are way stricter than needed, unnecessarily reducing the generality of your function with no benefit in performance. See argument-type declarations.

1 Like

I think there is nothing wrong with const global variables as long as they are containers of concrete types.

Even better is the use of a const global struct that contains all your pre-allocated arrays. Pass it as first parameter to your worker function and you are done…

So either this:

n=10
N=100
const R1=zeros(Float64,n)
const R2=zeros(Float64,n)
const V=zeros(Float64,n,N)
const P=zeros(Float64,n,N)
const G=zeros(Float64,n)

or this:

using Parameters
n=10
N=100
@with_kw struct Mem
    R1::Vector{Float64}=zeros(Float64,n)
    R2::Vector{Float64}=zeros(Float64,n)
    V::Matrix{Float64}=zeros(Float64,n,N)
    P::Matrix{Float64}=zeros(Float64,n,N)
    G::Vector{Float64}=zeros(Float64,n)
end 
const mem=Mem()
2 Likes

yes, correct. And doing what you do is the most easy way to benchmark your functions independently, if they are to be used within a more complex code. We do that all the time.

(ps: I’m not sure what you mean by “overwriting” here, but what you did there is fine)

1 Like

Cool! Thanks for the tips! I do have a question regarding the second one, though. Do not size or length cause temporary allocation inside the function?

Actually, it made the code faster:

49.192 ns (0 allocations: 0 bytes)

If the variables are passed to the function as parameters, there no need for them to be constants.

Making them constants can make the testing of the function annoying, if you want to change their values.

no, they don’t

In summary, that specifically you can write as:

julia> function F1!(V, P, G, i)
         @. V[:,i] =  rand()*(@view P[:,i]) + rand() * G
       end
F1! (generic function with 1 methods)

and test it with:

julia> n=10; N=100; V=zeros(n,N); P=zeros(n,N); G=zeros(n); i=10;

julia> @btime F1!($V,$P,$G,$i)
  30.811 ns (0 allocations: 0 bytes)

(also note the $ in the call to @btime, that is required)

2 Likes

In particular, when we talk about “allocations” we mean heap allocations, which are generally only needed to create (or enlarge) variable-sized containers like Array or Dict, or to create mutable struct objects. Allocating such blocks of memory requires an expensive system call (think malloc in C).

In contrast, immutable values like numbers (except bignums), tuples, and struct objects can be stored much more cheaply, e.g. in stack or CPU-register memory, with no system call, so one doesn’t typically worry about the performance cost of “allocating” them.

2 Likes

This is not true. If you have a definition like

const R1=zeros(10)

you can change the values, e.g.

R1[1]=10

The only thing you cannot do is to change the type.

1 Like

Thanks for clarifying that, I meant “values” is a more general sense, as if one wanted to change to which array the variable label is bound to. There is that subtlety:

julia> const x = [1,2];

julia> y = [3,4]; # same type of x

julia> x = y
WARNING: redefinition of constant x. This may fail, cause incorrect answers, or produce other errors.
2-element Vector{Int64}:
 3
 4

Well, this works:

const x = [1,2]
y = [3,4]
x .= y

Yes, you have to add a dot… not so difficult, is it?

Let me try to summarize (because there is a lot of confusion around this):

  1. Your initial approach is fine in terms of performance.
    “Avoid global variables” means avoid inheriting global variables through scoping. Your code will use global variables as inputs and outputs; this is normal.
    F1!() could have been defined without taking any arguments and still work. Then it would have to obtain the values from the outer (in this case global) scope. That would be bad practice and slow unless those variables are declared to be const. The const approach is intended for actual constants that are fixed across all simulations (none of your variables). It is useful because const variables are inherently known to all functions and are still fast to use. Link

  2. You don’t need to allocate R1 and R2 in your example. Just call rand directly.
    F1!(V, P, G, i) = @. V[:,i] = rand()*(@view P[:,i]) + rand() * G

  3. You could better organize your function inputs to be more readable.

  • Do not pass array sizes as arguments. size is very cheap to compute, so passing n and N alongside the arrays themselves just makes your code harder to read and more error prone. You should write the loop (no longer needed) as for j in eachindex(G). Then you don’t need n at all: link.
  • Some consider the type declarations unnecessary fluff: link. They don’t practically do anything besides document and throw better errors. This topic often causes heated arguments though.
  • A setup function for allocating the arrays would clean up your code, but it isn’t performance necessary.
function setup(n, N)
    V = zeros(Float64, n, N)
    P = zeros(Float64, n, N)
    G = zeros(Float64, n)
    return V, P, G
end
function compute!(V, P, G, i)
    @. V[:,i] = rand()*(@view P[:,i]) + rand() * G
end

n = 10
N = 100
i = 10
V, P, G = setup(n, N)
compute!(V, P, G, i)
  • A custom type struct could further clean up your code. Though I personally think the previous version is good enough.
struct Parameters
    V::Matrix{Float64}
    P::Matrix{Float64}
    G::Vector{Float64}
end
function setup(n, N)
    V = zeros(Float64, n, N)
    P = zeros(Float64, n, N)
    G = zeros(Float64, n)
    return Parameters(V, P, G)
end
function compute!(data, i)
    @. data.V[:,i] = rand()*(@view data.P[:,i]) + rand() * data.G
end


n = 10
N = 100
i = 10
data = setup(n, N)
compute!(data, i)
  • Typically in Julia, the function and struct definitions (top half of above code) would be stored in a separate package or file from the function calls (bottom half of above code) for better organization: link. Your original code has function and variable definitions all mixed together which makes it hard to read.
4 Likes

No, but it means a different thing.