Optimizing function with vector reassignments

Wanting to optimize this function and I know I have some un-needed allocations. Just wondering what the best choice is to fix a few lines:

function singleAtomDistances!(distMat::Array{Float64,3},crystal::Crystal,LJ::LJ,centerAtom::Vector{Float64}, centerType:: Integer, loopBounds::Vector{Int64})
#    ljvals = zeros(3,2)  #Specific to binary material.  Needs generalized to n-ary case.
    for (iNeighbor,aType) in enumerate(crystal.atomicBasis), neighboratom in aType  #Loop over the different atom types.
            # And these three inner loops are to find all of the periodic images of a neighboring atom.
        for i = -loopBounds[1]:loopBounds[1], j = -loopBounds[2]:loopBounds[2], k= -loopBounds[3]:loopBounds[3]
            newAtom = neighboratom + [i, j, k]   # I know this is a problem
            newCart = DirectToCartesian(crystal.latpar * crystal.lVecs,newAtom)
            r = norm(newCart - centerAtom) 
            if r < LJ.cutoff && !isapprox(r,0,atol = 1e-3)
                #If the neighbor atom is inside the unit cell, then its going to be
                # double counted at some point when we center on the other atom.  
                # So we count it as half each time.
                # The LJ parameters are stored in the upper triangular portion of a matrix
                # The bottom triangle is redundant.. interaction between a and b is equivalent
                # to interaction between b and a.  So I sort the indices here so that the bottom
                # triangle of the matrix never gets updated, only the upper right.
                indices = sort([iNeighbor,centerType])
                if all([i,j,k] .== 0 )     # Allocation here too.
                    distMat[indices...,1] +=  4 * 1.0/2.0 * 1.0/r^6    #Splatting causes allocations.
                    distMat[indices...,2] +=  4* 1.0/2.0 * 1.0/r^12
                else 
                    distMat[indices...,1] += 4 * 1.0/r^6
                    distMat[indices...,2] += 4 * 1.0/r^12
                end
            end
        end
    end
    return distMat
end
end

Thanks in advance.

This is a poster child for using StaticArrays.jl to represent your Cartesian coordinate vectors.

Also, your distMat indices are in the wrong order for locality.

indices = sort([iNeighbor,centerType])

This also allocates. You could use indices = iNeighbor < centerType ? (iNeighbor,centerType) : (centerType,iNeighbor), or use something like TupleSorting.jl — note the use of tuples rather than arrays for small fixed-length containers.

all([i,j,k] .== 0 )

This also allocates. You can do all(iszero, (i,j,k)) or simply i == j == k == 0. Or, if you have a static array ijk = @SVector[i,j,k] from above, you could just do iszero(ijk).

Beware the Matlab/Python habit of using arrays for every tiny calculation.

2 Likes

But the values of the vector change each iteration.

I know I’m still very Pythonic in my coding and trying to break that habit. But I also want elegant code so … trying to find the balance.

If you have a loop with an integer, i and do i += 1 on every iteration, do you worry about the value of i changing on every iteration? The situation with StaticArrays is the same.

If you do

for neighboratom in aType # aType is an array of 3-component SVectors
    newAtom = neighboratom + @SVector[i, j, k]
    ...
end

it “creates” new SVectors on each iterations, but these “new” vectors cost essentially nothing to create, the same as you don’t worry about “creating” new integers with i += 1. Technically, because the compiler knows the length, 3-component static arrays don’t need to be allocated on the heap — they can be put on the stack, or even stored implicitly in 3 registers.

Basically, the compiled code will be the same as if you had manually written out different scalar variables for each component newAtom_x, newAtom_y, and newAtom_z.

The upshot is that I would recommend using SVector (from StaticArrays.jl) for all of your 3-component coordinate vectors, everywhere in your code.

3 Likes

Thank you… and I’m with you… I just don’t understand how the compiler knows the length of @SVector[i,j,k], but doesn’t know the length of [i,j,k]

PS. In general, it will be a lot easier to help if you if you provide minimal runnable examples. That means providing all necessary functions (or stripping/stubbing them out), providing sample data, etcetera. Please read: make it easier to help you

The basic thing to keep in mind is that Julia’s compiler (like many compilers) specializes code mainly based on the types of objects, not based on their values (which can change at runtime). [i,j,k] is a Vector{Int}, which does not contain the length of the array in the type. In principle, the compiler could be made clever enough to infer it in this case — essentially by constant propagation — since [i,j,k] appears literally in the source code, though it doesn’t currently do this AFAIK. (Even that analysis can be tricky, e.g. because various functions can resize a Vector at runtime.) But what about all of your other vectors, like neighboratom? How is the compiler supposed to know their lengths?

In contrast, @SVector[i,j,k] creates an object of type SVector{3,Int}, where the length 3 is part of the type (and internally to the SVector the data is stored in a Tuple{Int,Int,Int}, for which the length is also part of the type, and the compiler has special optimizations for tuples). So, the compiler can specialize on this in several ways:

  1. local-variable SVector objects need not be stored on the heap, and can be stored on the stack or in registers (exactly as if the individual components were separate scalar variables)
  2. arrays of SVector objects can store them “inline” in the array data. e.g. a Vector{SVector{3,Float64}} stores an arbitrary number of 3-component SVectors, each as 3 numbers one after another in the array data (exactly like an array of Float64 of 3x the length). And when you access an element of the array, the compiler knows it is a 3-component SVector and can specialize the resulting code using (1) and (3).
  3. operations on SVector can be specialized on the length and completely unrolled. For example, adding two SVectors can be accomplished with 3 scalar additions and no loop.
5 Likes

Right… The type of the data is so key in Julia. I’ve got to remember that. Thank you. Sometimes I wish I hadn’t learned Python first because it trains you to not think about types at all. My knee jerk reaction when I think I have too many allocations is to pre-allocate the array and then modify the contents as needed. So creating a new vector at each iteration doesn’t register for me yet as optimal.

See also:

1 Like

What about the splatting. I think this is allocating too. I mean indices isn’t very long so I suppose I can just type the indices out

distMat[indices[1],indices[2],1] +=  4. * 1.0/2.0 * 1.0/r^6

but is splatting a bad idea when trying to avoid allocations in general?

Splatting a Tuple or SVector is (almost always) fine because the compiler knows exactly how many elements will appear in the splat. Splatting a Vector is bad because the compiler does not know how long it will be (so it will usually cause dynamic dispatch).

You can also use CartesianIndex to group adjacent indices into a single variable. Like
zeros(3,3,3)[CartesianIndex(2, 2), 2].

2 Likes