Best practice for array allocation in self-consistent calculations

In computational physics, self-consistent calculations are frequently met. In such calchlations, a function might be called repeatedly, and there might be some auxiliary arrays that are used in the function, which will keep the same size in the whole process.
For the sake of performance, what is the best practice to deal with such arrays?
I can think of the following ways:

  1. do nothing. it may look like this:
function foo(n_atoms, n_basis)
    bar = zeros(n_atoms, n_basis)
    # do something
end

The bar array might be allocated and freed frequently, which should take some time, I suppose?

  1. global variables. A non-const global variable might hurt performance, so maybe like this:
mutable struct MyBuffer
    bar::Matrix{Float64}
end
const buffer = MyBuffer(zeros(0, 0))
function set_bar!(n_atoms, n_basis)
    global buffer
    buffer.bar = zeros(n_atoms, n_basis)
end
function foo()
    global buffer
    # do something with buffer.bar
end

Remember this is a self-consistent calculation, so that n_atoms and n_basis are promised to keep constant. set_bar! only need to be called once.
3. let-block. This is somehow similar to global variables:

let
    global get_bar
    bar = zeros(0, 0)
    function get_bar(n_atoms, n_basis)
        sizes = size(bar)
        if sizes[1] < n_atoms || sizes[2] < n_basis
            bar = zeros(n_atoms, n_basis)
        end
        return view(bar, 1:n_atoms, 1:n_basis)
    end
end

function foo(n_atoms, n_basis)
    bar = get_bar(n_atoms, n_basis)
    # do something with bar
end

Of cause, view technique can also be used in the second way.
4. Overall struct. In practice, such a function are likely to receive many input parameters, it may look more like:

function foo(system::TotalSystem, n_atoms, n_basis)
end

In fact, n_atoms and n_basis are also very likely to be somewhere inside system.
So, this way is simply add a buffer field in the definition of TotalSystem, so it looks like:

function foo(system::TotalSystem, n_atoms, n_basis)
    bar = system.buffer.bar
    # do something with bar
end

The main pros and cons are the same: we need to declare every buffer in TotalSystem, instead of in the module / near the function that actually use it. On the one hand, this might make the code more difficult to read. On the other band, if we want to squeeze out the last bit of performance, we may easily merge some buffers in this way.
Consider there is another function buz which need a buffer of size n_atoms * 2n_basis, and will never be needed to be kept in memory the same time as bar (promised by physics). In this way we can only declare one matrix in TotalSystem and set it to be n_atoms * 2n_basis. In the second way, we may need one matrix in module Foo, and another in module Buz.

Are there any other ways to do this? What should be the best practice?

1 Like

Avoid global variables.

As to the other part of your question: you don’t need to stuff all your data into TotalSystem, instead compose it from components.

Global variables are generally viewed as bad, but, just like goto, maybe there are circumstances where they are useful? I’m not sure.

As for another part, if I understand correctly, what you mean is that to define another mutable struct TotalSystemBuffer, then add to every function that needs buffer a new parameter buffer::TotalSystemBuffer?

This surely works, and after careful thinking, I find it helpful for readability it we use functions like function foo(system::TotalSystem).
Without this, we have to first define buffer types used for certain utils, then define type TotalSystem which need all these buffer types, then the actual functions. The result is, definition of buffer types for certain utils cannot be located near the actual util codes.
With this, we can first define TotalSystem, then, for every util, buffer type and actual functions with signature function func!(buffer::CertainBuffer, system::TotalSystem). So the code for certain util can be grouped together.

This modified 4th method seems quite nice.

1 Like

I would make a “config” or “workspace” object that holds the temporary buffers, and have the caller construct and re-use it. Examples: https://docs.juliadsp.org/stable/periodograms/#Configuration-objects, https://github.com/baggepinnen/DynamicAxisWarping.jl/blob/a5e760e49758b2aaf6bcb52aca855585be7f6c9c/src/gdtw.jl#L1-L12. Similar to passing around buffer, but grouped up if you have more than one, and possibly along with the settings (I guess n_atoms, n_basis here), if those determine the size of the buffers.

2 Likes

First benchmark, and only do this optimization if you find out that allocations are a significant portion of the time spent. If your function is relatively expensive (eg you’re diagonalizing a hamiltonian at each step) then most likely the allocations are not a problem. As a general rule, if your runtime is a superlinear function of the data size, don’t bother with preallocation.

2 Likes

You can also consider wrapping the arrays into function-like objects.

struct FooStruct
    bar::Matrix{Float64}
    FooStruct(n_atoms, n_basis) = new(zeros(n_atoms, n_basis))
end

function (foo::FooStruct)(n_atoms, n_basis, other_useful_parameters)
    bar = foo.bar
    # do something
end
1 Like

A typical approach is to create a function

function foo!(buffer, args)
    buffer .= 0 # don't use zeros, mutate in-place
    # do something 
end

The following approach has several issues:

First of all, don’t use globals.

Secondly, it doesn’t actually use the buffer, but instead allocates a whole new array each time, with zeros(), and then assigns that to buffer.bar, ruining the whole point of the buffer.

Thirdly, the MyBuffer struct should not be mutable. Mutate buffer.bar not buffer, which can then be immutable. An immutable struct can hold mutable objects in its fields.

3 Likes

Does using const globals in module scope (as in option 2 of OP) really hurt performance? I naively thought that if the global is a const, there is no type inference problem, so the performamce will be more or less unaffected.

If the binding is const, the type will be fixed, yes, but you can still have a performance impact due to the compiler not knowing whether there’s something else that may access that global as well.

set_bar! need to be called only in the first iteration, so I think it should actually use the buffer since the second iteration?
Second, I did not find a way to mutate field but not struct, unless I use Ref. Maybe this is what you mean? However, I do find that from Julia 1.8, I can declare some fields of a mutable struct as const so that they are immutable.

I Definitely support this. Even for linear scaling algorithms I often find that removing all allocations gives me only around 10-30% speed up. Somebody explained to me once that Julia is reasonably clever about reusing such temporary arrays but TBH I never followed up on that … I’d be curious to hear an expert on this point actually.