Wanting to optimize this function and I know I have some un-needed allocations. Just wondering what the best choice is to fix a few lines:

function singleAtomDistances!(distMat::Array{Float64,3},crystal::Crystal,LJ::LJ,centerAtom::Vector{Float64}, centerType:: Integer, loopBounds::Vector{Int64})
# ljvals = zeros(3,2) #Specific to binary material. Needs generalized to n-ary case.
for (iNeighbor,aType) in enumerate(crystal.atomicBasis), neighboratom in aType #Loop over the different atom types.
# And these three inner loops are to find all of the periodic images of a neighboring atom.
for i = -loopBounds[1]:loopBounds[1], j = -loopBounds[2]:loopBounds[2], k= -loopBounds[3]:loopBounds[3]
newAtom = neighboratom + [i, j, k] # I know this is a problem
newCart = DirectToCartesian(crystal.latpar * crystal.lVecs,newAtom)
r = norm(newCart - centerAtom)
if r < LJ.cutoff && !isapprox(r,0,atol = 1e-3)
#If the neighbor atom is inside the unit cell, then its going to be
# double counted at some point when we center on the other atom.
# So we count it as half each time.
# The LJ parameters are stored in the upper triangular portion of a matrix
# The bottom triangle is redundant.. interaction between a and b is equivalent
# to interaction between b and a. So I sort the indices here so that the bottom
# triangle of the matrix never gets updated, only the upper right.
indices = sort([iNeighbor,centerType])
if all([i,j,k] .== 0 ) # Allocation here too.
distMat[indices...,1] += 4 * 1.0/2.0 * 1.0/r^6 #Splatting causes allocations.
distMat[indices...,2] += 4* 1.0/2.0 * 1.0/r^12
else
distMat[indices...,1] += 4 * 1.0/r^6
distMat[indices...,2] += 4 * 1.0/r^12
end
end
end
end
return distMat
end
end

This is a poster child for using StaticArrays.jl to represent your Cartesian coordinate vectors.

Also, your distMat indices are in the wrong order for locality.

indices = sort([iNeighbor,centerType])

This also allocates. You could use indices = iNeighbor < centerType ? (iNeighbor,centerType) : (centerType,iNeighbor), or use something like TupleSorting.jl â€” note the use of tuples rather than arrays for small fixed-length containers.

all([i,j,k] .== 0 )

This also allocates. You can do all(iszero, (i,j,k)) or simply i == j == k == 0. Or, if you have a static array ijk = @SVector[i,j,k] from above, you could just do iszero(ijk).

Beware the Matlab/Python habit of using arrays for every tiny calculation.

If you have a loop with an integer, i and do i += 1 on every iteration, do you worry about the value of i changing on every iteration? The situation with StaticArrays is the same.

If you do

for neighboratom in aType # aType is an array of 3-component SVectors
newAtom = neighboratom + @SVector[i, j, k]
...
end

it â€ścreatesâ€ť new SVectors on each iterations, but these â€śnewâ€ť vectors cost essentially nothing to create, the same as you donâ€™t worry about â€ścreatingâ€ť new integers with i += 1. Technically, because the compiler knows the length, 3-component static arrays donâ€™t need to be allocated on the heap â€” they can be put on the stack, or even stored implicitly in 3 registers.

Basically, the compiled code will be the same as if you had manually written out different scalar variables for each component newAtom_x, newAtom_y, and newAtom_z.

The upshot is that I would recommend using SVector (from StaticArrays.jl) for all of your 3-component coordinate vectors, everywhere in your code.

Thank youâ€¦ and Iâ€™m with youâ€¦ I just donâ€™t understand how the compiler knows the length of @SVector[i,j,k], but doesnâ€™t know the length of [i,j,k]

PS. In general, it will be a lot easier to help if you if you provide minimal runnable examples. That means providing all necessary functions (or stripping/stubbing them out), providing sample data, etcetera. Please read: make it easier to help you

The basic thing to keep in mind is that Juliaâ€™s compiler (like many compilers) specializes code mainly based on the types of objects, not based on their values (which can change at runtime). [i,j,k] is a Vector{Int}, which does not contain the length of the array in the type. In principle, the compiler could be made clever enough to infer it in this case â€” essentially by constant propagation â€” since [i,j,k] appears literally in the source code, though it doesnâ€™t currently do this AFAIK. (Even that analysis can be tricky, e.g. because various functions can resize a Vector at runtime.) But what about all of your other vectors, like neighboratom? How is the compiler supposed to know their lengths?

In contrast, @SVector[i,j,k] creates an object of type SVector{3,Int}, where the length 3 is part of the type (and internally to the SVector the data is stored in a Tuple{Int,Int,Int}, for which the length is also part of the type, and the compiler has special optimizations for tuples). So, the compiler can specialize on this in several ways:

local-variable SVector objects need not be stored on the heap, and can be stored on the stack or in registers (exactly as if the individual components were separate scalar variables)

arrays of SVector objects can store them â€śinlineâ€ť in the array data. e.g. a Vector{SVector{3,Float64}} stores an arbitrary number of 3-component SVectors, each as 3 numbers one after another in the array data (exactly like an array of Float64 of 3x the length). And when you access an element of the array, the compiler knows it is a 3-component SVector and can specialize the resulting code using (1) and (3).

operations on SVector can be specialized on the length and completely unrolled. For example, adding two SVectors can be accomplished with 3 scalar additions and no loop.

Rightâ€¦ The type of the data is so key in Julia. Iâ€™ve got to remember that. Thank you. Sometimes I wish I hadnâ€™t learned Python first because it trains you to not think about types at all. My knee jerk reaction when I think I have too many allocations is to pre-allocate the array and then modify the contents as needed. So creating a new vector at each iteration doesnâ€™t register for me yet as optimal.

Splatting a Tuple or SVector is (almost always) fine because the compiler knows exactly how many elements will appear in the splat. Splatting a Vector is bad because the compiler does not know how long it will be (so it will usually cause dynamic dispatch).

You can also use CartesianIndex to group adjacent indices into a single variable. Like zeros(3,3,3)[CartesianIndex(2, 2), 2].