# Where is the extra memory allocation?

I wrote a program several days ago, and the memory allocation is acceptable (2.79k allocations: 124.820MiB). Today, I add some new functions in it to implenment another feature. Then the memory allocation becomes much bigger(19.49M allocations : 1.864GiB). Then I use the --track-allocation=all to see where is the extra allocation. But in the .mem files, the memory allocation in each line of the newly added functions is 0. I am confused now and wanna know where is the additional memory allocation. If it is necessary, I will post my code and mem files. Thank you for your help.

From that description that smells like a type instability that was introduced. But yes, we need the code.

Thanks for your reply and following are the .mem files.
The newly added function is isintersect! to judge whether a line segment is intersect with a given crack.
This is the files with the newly added functions working.




This is the .mem file with the newly added functions muted (add a # in the newly added function.)




I deletate the .mem files because I give a woking code following.

That is hard to follow. It would be better if you provided a working example. But the first guess here is a type instability, thus it is probably a good idea to inspect the code with @code_warntype.

Thank you for your reply! That is ok. And I have to spend some time to prepare, because this is a part of a big program.

Here is a working example.

# using Cell List Algorithm to do the Fixed Radius Near Neighbor Problem
# newly add function: judging whether the point pair is intersect with a given crack, if yes, then donot save the pointpair
##########################################
using Revise
mutable struct PairInf
numpair_pair::Int64
ithpoi_pair::Vector{Int64}
jthpoi_pair::Vector{Int64}
pairlen_pair::Vector{Float64}
end
mutable struct JugInf
linint::Bool
r::Vector{Float64}
s::Vector{Float64}
qmp::Vector{Float64}
end
struct CraCoo
numele_cra::Int64
cracoo_cra::Array{Float64,3}
end
#########################################
function findpoipair!(pairinf::PairInf,poicoo::Matrix{Float64},β::Float64,juginf::JugInf,cracoo::CraCoo)
poicell = Dict{Tuple{Int64,Int64},Vector{Int64}}()
putpoicell!(poicell,poicoo,β)
neicell = Vector{Tuple{Int64,Int64}}(undef,4)
for cellkey in keys(poicell)
value = poicell[cellkey]
incellpair!(pairinf,poicoo,value,β,juginf,cracoo)
neicellpair!(pairinf,poicoo,neicell,poicell,cellkey,β,juginf,cracoo)
end
return pairinf
end
function putpoicell!(poicell::Dict{Tuple{Int64,Int64},Vector{Int64}},poicoo_poi::Matrix{Float64},β::Float64)
for ithpoi_poi = 1:numpoi
x::Int64 = fld(poicoo_poi[1,ithpoi_poi],β)
y::Int64 = fld(poicoo_poi[2,ithpoi_poi],β)
z::Tuple{Int64,Int64} = (x,y)
push!(poicell[z],ithpoi_poi)
else
poicell[z] = [ithpoi_poi]
end
end
return poicell
end
function incellpair!(pairinf::PairInf,poicoo::Matrix{Float64},value::Vector{Int64},β::Float64,juginf::JugInf,cracoo::CraCoo)
numpoi_cell = length(value)
if numpoi_cell β₯ 2
for ithpoi_cell = 1:numpoi_cell-1
for jthpoi_cell = ithpoi_cell+1:numpoi_cell
ispair!(pairinf,poicoo,value[ithpoi_cell],value[jthpoi_cell],β,juginf,cracoo)
end
end
end
return pairinf
end
function neicellpair!(pairinf::PairInf,poicoo::Matrix{Float64},neicell::Vector{Tuple{Int64,Int64}},poicell::Dict{Tuple{Int64,Int64},Vector{Int64}},cellkey::Tuple{Int64,Int64},β::Float64,juginf::JugInf,cracoo::CraCoo)
neicell = getneicell!(neicell,cellkey)
for ithnei_cell = 1:4
hascell = get(poicell,neicell[ithnei_cell],nothing)
if !isnothing(hascell)
twocellpair!(pairinf,poicoo,poicell[cellkey],poicell[neicell[ithnei_cell]],β,juginf,cracoo)
end
end
return pairinf
end
function twocellpair!(pairinf::PairInf,poicoo::Matrix{Float64},ithcell::Vector{Int64},jthcell::Vector,β::Float64,juginf::JugInf,cracoo::CraCoo)
for ithpoi in ithcell
for jthpoi in jthcell
ispair!(pairinf,poicoo,ithpoi,jthpoi,β,juginf,cracoo)
end
end
return pairinf
end
function ispair!(pairinf::PairInf,poicoo::Matrix{Float64},ithpoi::Int64,jthpoi::Int64,β::Float64,juginf::JugInf,cracoo::CraCoo)
@views pairlen = calpairlen(poicoo[:,ithpoi],poicoo[:,jthpoi])
if pairlen β€ β #|| abs(pairlen-β) < 1.E-10
@views isintersect!(juginf,poicoo[:,ithpoi],poicoo[:,jthpoi],cracoo)
if !juginf.linint
pairinf.numpair_pair += 1
push!(pairinf.ithpoi_pair,ithpoi)
push!(pairinf.jthpoi_pair,jthpoi)
push!(pairinf.pairlen_pair,pairlen)
end
end
return pairinf
end
function getneicell!(neicell::Vector{Tuple{Int64,Int64}},cellkey::Tuple{Int64,Int64})
neicell[1] = (cellkey[1] + 1, cellkey[2])
neicell[2] = (cellkey[1], cellkey[2] - 1)
neicell[3] = (cellkey[1] + 1, cellkey[2] - 1)
neicell[4] = (cellkey[1] + 1, cellkey[2] + 1)
return neicell
end
function calpairlen(poicoo_ith::SubArray,poicoo_jth::SubArray)
pairlen = 0.0
for ithdim = 1:2
pairlen += (poicoo_ith[ithdim] - poicoo_jth[ithdim]) ^ 2
end
pairlen = sqrt(pairlen)
return pairlen
end
function isintersect!(juginf::JugInf,ithcoo_poi::SubArray,jthcoo_poi::SubArray,cracoo::CraCoo)
for ithele_cra = 1:cracoo.numele_cra
@views juginf.linint = juglinlin!(juginf.r,juginf.s,juginf.qmp,ithcoo_poi,jthcoo_poi,cracoo.cracoo_cra[:,1,ithele_cra],cracoo.cracoo_cra[:,2,ithele_cra])
if juginf.linint
break
end
end
end
function juglinlin!(r::Vector{Float64},s::Vector{Float64},qmp::Vector{Float64},pβ::SubArray,pβ::SubArray,qβ::SubArray,qβ::SubArray)
r .= pβ - pβ
s .= qβ - qβ
qmp .= qβ - pβ
linint = true
rts = r[1] * s[2] - r[2] * s[1]
qmptr = qmp[1] * r[2] - qmp[2] * r[1]
if rts > 0.0
if qmptr < 0.0 || qmptr > rts
linint = false
else
qmpts = qmp[1] * s[2] - qmp[2] * s[1]
if qmpts < 0.0 || qmpts > rts
linint = false
end
end
elseif rts < 0.0
if qmptr > 0.0 || qmptr < rts
linint = false
else
qmpts = qmp[1] * s[2] - qmp[2] * s[1]
if qmpts > 0.0 || qmpts < rts
linint = false
end
end
else
linint = false
end
return linint
end
###################################
const numpoi = 501 * 501
a = collect(0.0:0.002:1)
poicoo_x = repeat(transpose(a),501)[:]
poicoo_y = repeat(a,501)[:]
poicoo = zeros(Float64,2,numpoi)
poicoo[1,:] = poicoo_x
poicoo[2,:] = poicoo_y
β = 0.05
numpair_pair = 0
ithpoi_pair = Vector{Int64}()
jthpoi_pair = Vector{Int64}()
pairlen_pair = Vector{Float64}()
pairinf = PairInf(numpair_pair,ithpoi_pair,jthpoi_pair,pairlen_pair)
linint = true
r = zeros(Float64,2)
s = zeros(Float64,2)
qmp = zeros(Float64,2)
juginf = JugInf(linint,r,s,qmp)
numele_cra = 2
cracoo_cra = zeros(Float64,2,2,numele_cra)
cracoo_cra[:,:,1] = [0.45 0.45; 0.0 1.0]
cracoo_cra[:,:,2] = [0.55 0.55; 0.0 1.0]
cracoo = CraCoo(numele_cra,cracoo_cra)
@time findpoipair!(pairinf,poicoo,β,juginf,cracoo)


The result of @time is

94.980503 seconds (1.39 G allocations: 124.377 GiB, 7.29% gc time)


Then I mute the newly added function isintersect!, here is the code.

# using Cell List Algorithm to do the Fixed Radius Near Neighbor Problem
# newly add function: judging whether the point pair is intersect with a given crack, if yes, then donot save the pointpair
##########################################
using Revise
mutable struct PairInf
numpair_pair::Int64
ithpoi_pair::Vector{Int64}
jthpoi_pair::Vector{Int64}
pairlen_pair::Vector{Float64}
end
mutable struct JugInf
linint::Bool
r::Vector{Float64}
s::Vector{Float64}
qmp::Vector{Float64}
end
struct CraCoo
numele_cra::Int64
cracoo_cra::Array{Float64,3}
end
#########################################
function findpoipair!(pairinf::PairInf,poicoo::Matrix{Float64},β::Float64,juginf::JugInf,cracoo::CraCoo)
poicell = Dict{Tuple{Int64,Int64},Vector{Int64}}()
putpoicell!(poicell,poicoo,β)
neicell = Vector{Tuple{Int64,Int64}}(undef,4)
for cellkey in keys(poicell)
value = poicell[cellkey]
incellpair!(pairinf,poicoo,value,β,juginf,cracoo)
neicellpair!(pairinf,poicoo,neicell,poicell,cellkey,β,juginf,cracoo)
end
return pairinf
end
function putpoicell!(poicell::Dict{Tuple{Int64,Int64},Vector{Int64}},poicoo_poi::Matrix{Float64},β::Float64)
for ithpoi_poi = 1:numpoi
x::Int64 = fld(poicoo_poi[1,ithpoi_poi],β)
y::Int64 = fld(poicoo_poi[2,ithpoi_poi],β)
z::Tuple{Int64,Int64} = (x,y)
push!(poicell[z],ithpoi_poi)
else
poicell[z] = [ithpoi_poi]
end
end
return poicell
end
function incellpair!(pairinf::PairInf,poicoo::Matrix{Float64},value::Vector{Int64},β::Float64,juginf::JugInf,cracoo::CraCoo)
numpoi_cell = length(value)
if numpoi_cell β₯ 2
for ithpoi_cell = 1:numpoi_cell-1
for jthpoi_cell = ithpoi_cell+1:numpoi_cell
ispair!(pairinf,poicoo,value[ithpoi_cell],value[jthpoi_cell],β,juginf,cracoo)
end
end
end
return pairinf
end
function neicellpair!(pairinf::PairInf,poicoo::Matrix{Float64},neicell::Vector{Tuple{Int64,Int64}},poicell::Dict{Tuple{Int64,Int64},Vector{Int64}},cellkey::Tuple{Int64,Int64},β::Float64,juginf::JugInf,cracoo::CraCoo)
neicell = getneicell!(neicell,cellkey)
for ithnei_cell = 1:4
hascell = get(poicell,neicell[ithnei_cell],nothing)
if !isnothing(hascell)
twocellpair!(pairinf,poicoo,poicell[cellkey],poicell[neicell[ithnei_cell]],β,juginf,cracoo)
end
end
return pairinf
end
function twocellpair!(pairinf::PairInf,poicoo::Matrix{Float64},ithcell::Vector{Int64},jthcell::Vector,β::Float64,juginf::JugInf,cracoo::CraCoo)
for ithpoi in ithcell
for jthpoi in jthcell
ispair!(pairinf,poicoo,ithpoi,jthpoi,β,juginf,cracoo)
end
end
return pairinf
end
function ispair!(pairinf::PairInf,poicoo::Matrix{Float64},ithpoi::Int64,jthpoi::Int64,β::Float64,juginf::JugInf,cracoo::CraCoo)
@views pairlen = calpairlen(poicoo[:,ithpoi],poicoo[:,jthpoi])
if pairlen β€ β #|| abs(pairlen-β) < 1.E-10
# @views isintersect!(juginf,poicoo[:,ithpoi],poicoo[:,jthpoi],cracoo)
# if !juginf.linint
pairinf.numpair_pair += 1
push!(pairinf.ithpoi_pair,ithpoi)
push!(pairinf.jthpoi_pair,jthpoi)
push!(pairinf.pairlen_pair,pairlen)
# end
end
return pairinf
end
function getneicell!(neicell::Vector{Tuple{Int64,Int64}},cellkey::Tuple{Int64,Int64})
neicell[1] = (cellkey[1] + 1, cellkey[2])
neicell[2] = (cellkey[1], cellkey[2] - 1)
neicell[3] = (cellkey[1] + 1, cellkey[2] - 1)
neicell[4] = (cellkey[1] + 1, cellkey[2] + 1)
return neicell
end
function calpairlen(poicoo_ith::SubArray,poicoo_jth::SubArray)
pairlen = 0.0
for ithdim = 1:2
pairlen += (poicoo_ith[ithdim] - poicoo_jth[ithdim]) ^ 2
end
pairlen = sqrt(pairlen)
return pairlen
end
function isintersect!(juginf::JugInf,ithcoo_poi::SubArray,jthcoo_poi::SubArray,cracoo::CraCoo)
for ithele_cra = 1:cracoo.numele_cra
@views juginf.linint = juglinlin!(juginf.r,juginf.s,juginf.qmp,ithcoo_poi,jthcoo_poi,cracoo.cracoo_cra[:,1,ithele_cra],cracoo.cracoo_cra[:,2,ithele_cra])
if juginf.linint
break
end
end
end
function juglinlin!(r::Vector{Float64},s::Vector{Float64},qmp::Vector{Float64},pβ::SubArray,pβ::SubArray,qβ::SubArray,qβ::SubArray)
r .= pβ - pβ
s .= qβ - qβ
qmp .= qβ - pβ
linint = true
rts = r[1] * s[2] - r[2] * s[1]
qmptr = qmp[1] * r[2] - qmp[2] * r[1]
if rts > 0.0
if qmptr < 0.0 || qmptr > rts
linint = false
else
qmpts = qmp[1] * s[2] - qmp[2] * s[1]
if qmpts < 0.0 || qmpts > rts
linint = false
end
end
elseif rts < 0.0
if qmptr > 0.0 || qmptr < rts
linint = false
else
qmpts = qmp[1] * s[2] - qmp[2] * s[1]
if qmpts > 0.0 || qmpts < rts
linint = false
end
end
else
linint = false
end
return linint
end
###################################
const numpoi = 501 * 501
a = collect(0.0:0.002:1)
poicoo_x = repeat(transpose(a),501)[:]
poicoo_y = repeat(a,501)[:]
poicoo = zeros(Float64,2,numpoi)
poicoo[1,:] = poicoo_x
poicoo[2,:] = poicoo_y
β = 0.05
numpair_pair = 0
ithpoi_pair = Vector{Int64}()
jthpoi_pair = Vector{Int64}()
pairlen_pair = Vector{Float64}()
pairinf = PairInf(numpair_pair,ithpoi_pair,jthpoi_pair,pairlen_pair)
linint = true
r = zeros(Float64,2)
s = zeros(Float64,2)
qmp = zeros(Float64,2)
juginf = JugInf(linint,r,s,qmp)
numele_cra = 2
cracoo_cra = zeros(Float64,2,2,numele_cra)
cracoo_cra[:,:,1] = [0.45 0.45; 0.0 1.0]
cracoo_cra[:,:,2] = [0.55 0.55; 0.0 1.0]
cracoo = CraCoo(numele_cra,cracoo_cra)
@time findpoipair!(pairinf,poicoo,β,juginf,cracoo)


The result of  @time is

14.815069 seconds (4.10 k allocations: 5.316 GiB, 2.44% gc time)


As you can see, the memory allocation increases a lot ! But where are they?
I set a big number of points because I want to show the big difference of memory allocation. Of course you can try with less points, but remember to increase the \ell at the same time to make sure there a enough points in its neighboorhood.

Just a first guess, in these three lines, maybe you want to add a . in the - as well, to read, for example r .= p2 .- p1, to the the operation component by component?

(as a side note, for the nearest neighbour problem and cell lists, you may be interested in CellListMap.jl and/or NearestNeighbor.jl)

Thanks for your nice suggesttions! I will have a try tomorrow. According to my limited experience, avoiding the vector operation and do it with explicit for loop (or with .) can really reduce the memory allocation. I will have a look at the packages you mentioned tomorrow. Thanks again!

I have tried it here, and it reduces the allocations to the same as you have in the other example.

The reason is that p2 - p1, without the dot, allocates a new array, which is then being broadcasted to the elements of r. With the dots in all operations, or using @. r = p2 - p1, there is loop fusion, and no intermediate vector is created.

Both packages there have fixed radius nearest neighbor functions implemented (here and here).

The CellListMap.jl is particularly useful if you do not actually need the neighbour list, but want to compute some property which is dependent on pairwise distances instead. It can also be faster depending on the type of data you have, and runs in parallel (Iβm the developer, so fell free to ask any question, if you want).

1 Like

I have tried it too, the memory allocation now is acceptable. Thank you a lot!
Actually I have visited your website and found that you are doing interseting research with MD simulation. I am currently working with peridynamics, which is a coarse-grained version of MD. Recently, I start to think that is there some advanced algorithms/theory/data structures about effectively calculating the damage field in peridynamcis, which is similar to calculate the force applied to one point by its neighbor points in MD simulation. This process costs a lot of time during simulation because you have to loop over all the point pairs, which is a big number compared to the number of points. Multithreading is a good method, but I use the @threads before the for loop, the memory allocation increase a lot and the program is slower. I found that CellListMap.jl succeeds in parallel computing, and I am going to learn about your code. Maybe I should look into some computational integral geometry books. Thanks for your kindly help! And I wanna know is it ok to send you some emails for further discussion about the advanced algorithms .
[/quote]

1 Like

From my very brief reading about perydinamics, something like CellListMap.jl may apply, if you use the interaction within-horizon approach (which is described in this video, for example: Peridynamics Theory Basics | What is peridynamics? - YouTube). But I donβt know much about the field.

Thanks for your kindly suggestions. Of course CellListMap.jl can apply in PD and I will have a try.

1 Like