In computational physics, self-consistent calculations are frequently met. In such calchlations, a function might be called repeatedly, and there might be some auxiliary arrays that are used in the function, which will keep the same size in the whole process.
For the sake of performance, what is the best practice to deal with such arrays?
I can think of the following ways:
- do nothing. it may look like this:
function foo(n_atoms, n_basis)
bar = zeros(n_atoms, n_basis)
# do something
end
The bar array might be allocated and freed frequently, which should take some time, I suppose?
- global variables. A non-const global variable might hurt performance, so maybe like this:
mutable struct MyBuffer
bar::Matrix{Float64}
end
const buffer = MyBuffer(zeros(0, 0))
function set_bar!(n_atoms, n_basis)
global buffer
buffer.bar = zeros(n_atoms, n_basis)
end
function foo()
global buffer
# do something with buffer.bar
end
Remember this is a self-consistent calculation, so that n_atoms and n_basis are promised to keep constant. set_bar!
only need to be called once.
3. let-block. This is somehow similar to global variables:
let
global get_bar
bar = zeros(0, 0)
function get_bar(n_atoms, n_basis)
sizes = size(bar)
if sizes[1] < n_atoms || sizes[2] < n_basis
bar = zeros(n_atoms, n_basis)
end
return view(bar, 1:n_atoms, 1:n_basis)
end
end
function foo(n_atoms, n_basis)
bar = get_bar(n_atoms, n_basis)
# do something with bar
end
Of cause, view technique can also be used in the second way.
4. Overall struct. In practice, such a function are likely to receive many input parameters, it may look more like:
function foo(system::TotalSystem, n_atoms, n_basis)
end
In fact, n_atoms and n_basis are also very likely to be somewhere inside system.
So, this way is simply add a buffer field in the definition of TotalSystem, so it looks like:
function foo(system::TotalSystem, n_atoms, n_basis)
bar = system.buffer.bar
# do something with bar
end
The main pros and cons are the same: we need to declare every buffer in TotalSystem, instead of in the module / near the function that actually use it. On the one hand, this might make the code more difficult to read. On the other band, if we want to squeeze out the last bit of performance, we may easily merge some buffers in this way.
Consider there is another function buz
which need a buffer of size n_atoms * 2n_basis, and will never be needed to be kept in memory the same time as bar (promised by physics). In this way we can only declare one matrix in TotalSystem and set it to be n_atoms * 2n_basis. In the second way, we may need one matrix in module Foo, and another in module Buz.
Are there any other ways to do this? What should be the best practice?