The following overly-simplified extraction from my code shows a section of the code which is allocating memory at every loop iteration.
I think this should not happen and I’d like to have advise and suggestion on how to optimize/correct it.
cache_unique_edges = array_cache(conn_unique_edges) # allocation done here to prevent further allocations later.
for iel = 1:nelem
for e = 1:E
for g = 1:G
ai = getindex!(cache_unique_edges, conn_unique_edges, g)
if (ai == ai)
# THIS if STATEMENT CAUSES OVER-ALLOCATION but I want to avoid it!
When the if statement is commented out, the allocation and timing are: [ Info: 17.088092 seconds (37.64 k allocations: 2.114 MiB)
However when the the code executes if (ai == ai), then allocation and timing are: [ Info: 35.950601 seconds (676.71 M allocations: 10.085 GiB, 1.46% gc time)
hi @Sukera thanks for replying. I am extracting a working code for you to test. Because it is part of a major code that I am developing, you will need to run it from within its own --project=. and add a couple libraries.
I hope that is ok. I’ll post a github link shortly
I don’t know where array_cache comes from, but my guess is that the function allocates a new array internally?
This creates two new arrays per iteration, together with the surrounding loops that’s a total of NLOCAL * NEL * NGLOBAL * 2 allocations. Either use a tuple (so (ai, ai) etc) or write the comparison exiplicitly.
I do not have a local MPI setup, so I can’t really run your code sorry. I also don’t know where getindex! is coming from - github code search does not show any hits in your repository, so I don’t know what it’s type would be and thus can’t really figure out which getindex method on ai would be called. However, since you report the same behavior with standard arrays, I’m assuming the getindex itself does not allocate.