Greetings everyone,
Here I would like to use a specific example from a code of mine, to help me (and hopefully others) better understand how allocations work.
I’ve got a code which imports data (xyz coordinates of hydrogen particles) from a very large file, calculates the relative positions of each possible pair of particles, and then does some calculations.
After reading the basic performance tips a few times, I gather that to make code efficient, I should preallocate everything, avoid global variables, use functions, and then life should be good.
Thus, my idea is, if I preallocate some big arrays, and my functions just replace their values at each step, new allocations wouldn’t be created since new values just replace old ones.
Turns out some of my functions are actually doing that, while the others are allocating extra memory, and I can’t understand why.
Below is an MRE which captures the main points I want to make.
using LinearAlgebra
using CSV
using DataFrames
data = CSV.write("data.csv",
DataFrame([collect(1:1000) ones(1000, 1) rand(1000, 3)], :auto))
# Preallocate everyhting
# (shouldn't be worth paying attention to the details here)
positions::Matrix{Float64} = Matrix{Float64}(undef, 666, 3)
box::Matrix{Float64} = [-1/2 1/2; -1/2 1/2; -1/2 1/2]
Hpairs::Vector{Vector{Float64}} = [ones(Float64, 3) for _ in 1:floor(Int, 666 * (666 - 1) / 2)]
rtp = copy(Hpairs)
F::Vector{Vector{ComplexF64}} = [complex(ones(Float64, 3)) for _ in 1:floor(Int, 666 * (666 - 1) / 2)]
# Define functions
function read_H_positions!(positions::Matrix{Float64})
# normally I'd only read a part of the file using CSV.read(....,skipto=_,limit=_)
positions[:, :] = Matrix(CSV.read("data.csv", DataFrame)[Not(1:3:end), 3:end])
end
function getpairs!(Hpairs::Vector{Vector{Float64}}, positions::Matrix{Float64})
x = 1
for k in axes(positions, 1)
for j in axes(positions, 1)
j > k || continue
Hpairs[x][:] = @views (positions[k, :] .- positions[j, :])
x += 1
end
end
end
function periodicboundary!(Hpairs::Vector{Vector{Float64}}, box::Matrix{Float64})
for i in 1:length(Hpairs)
for l in 1:3
if Hpairs[i][l] > box[l, 2]
Hpairs[i][l] -= 2 .* box[l, 2]
elseif Hpairs[i][l] < box[l, 1]
Hpairs[i][l] -= 2 .* box[l, 1]
end
end
end
end
function cart2sph!(rtp::Vector{Vector{Float64}}, xyz::Vector{Vector{Float64}})
for i in eachindex(rtp)
rtp[i][1] = norm(xyz[i], 2)
rtp[i][2] = acos.(xyz[i][3] ./ norm(xyz[i], 2))
rtp[i][3] = atan.(xyz[i][2], xyz[i][1])
end
end
function F012!(F::Vector{Vector{ComplexF64}}, rtp::Vector{Vector{Float64}})
for i in 1:length(rtp)
F[i][:] = [(1 - 3 .* (rtp[i][2]) .^ 2) ./ rtp[i][1] .^ 3,
(sin.(rtp[i][2]) .* cos(rtp[i][2]) .* exp(-1im .* rtp[i][3])) ./ rtp[i][1] .^ 3,
(sin.(rtp[i][2]) .^ 2 .* exp.(-2im .* rtp[i][3])) ./ rtp[i][1] .^ 3]
end
end
#Run functions
@time read_H_positions!(positions);
@time getpairs!(Hpairs, positions);
@time periodicboundary!(Hpairs, box);
@time cart2sph!(rtp, Hpairs);
@time F012!(F, rtp);
It seems that periodicboundary!() and cart2sph!() functions work as intended, and they don’t generate unwanted allocations, whereas the other functions are a bit naughty. Why is that happening, and how would you deal with this?
The “Run functions” section woulld normally be in a big loop, so after a few repetitions allocations become too much, and the program pretty much never finishes.
I’m sure there are many other mistakes and pitfalls in this code, and I’d be very grateful if you could point them out, but the essence is, why the exra allocations??