How to speed the function H in this code? (Hausdorff Chirality Measurement)

Luis_Manuel_Espinoza

1m

I am trying to do the Hausdorff Chirality Measurement from the article by Andrzej B. Buda and Kurt Mislow* but as I thought it is taking too long because to obtain a reliable value, I need to take “n” greater than 1,000,000 and it is with structures of 35 atoms It takes me almost 20 minutes, I was thinking about how to speed it up but I haven’t been able to make progress, I don’t know if anyone has an idea to speed it up, I would try to do it with parallel computing but I have no experience.

function Compute_of_CM_R_CCM(coords)
    # Compute the center of mass of the atoms
    n = size(coords, 1)
    center_of_mass = sum(coords, dims=1) / n
    # Compute the distances between the center of mass and each atom
    distances = zeros(n)
    for i in 1:n
        distances[i] = norm((center_of_mass.-coords)[i,:])
    end
    # Compute the radius of the smallest circle that encloses the atoms
    radius = maximum(distances)
    #Compute the coordinates translated to the center of mass
    coordinates_origin = coords .- center_of_mass
    
    return center_of_mass, radius, coordinates_origin
end

 function ρ(Q,Qp) 
    n = size(Q,1)
    sup = []
    for i in 1:n
        inf = []
        for j in 1:n
            dis = norm(Q[i,:].-Qp[j,:])
            push!(inf,dis)
        end
        push!(sup,minimum(inf))
    end
    return maximum(sup)    
end

function dQ(coordenadas)
    n = size(coordenadas,1)
    D = []
    for i in 1:n 
        distan = []
        for j in i+1:n
            d = norm(coordenadas[i,:]-coordenadas[j,:])
            push!(D,d)
        end
        end
    return maximum(D)
end

function H(q,qp,n)
    A = Compute_of_CM_R_CCM(q)[3]
    B = Compute_of_CM_R_CCM(qp)[3]
    HH = []
    for i in 1:n
        BB = B*qr(randn(3, 3)).Q
        HCM = max(ρ(A,BB),ρ(BB,A))/dQ(A)
        push!(HH,HCM)
    end
    return minimum(HH)
end

thanks for your time!

Sorry, I forgot to add an example of coordinates that are being used:

s = [ 1.73761 -2.3299 4.09897
   1.7145 -1.88395 2.6064
   0.810808 -3.53329 4.43137
   1.35042 -1.47855 4.72023
   2.18376 -2.67875 1.98346
   2.32905 -0.962011 2.49604
   3.08037 -3.63788 4.84995
   3.53686 -2.0441 5.09464
  -0.656357 -2.61487 3.56265
   3.11152 -2.69374 4.42536
   1.24236 -4.55928 4.94814
   2.0e-6 -1.53167 1.96559
  -0.51205 -3.38046 4.19649]

There are many inefficiencies in your code, but I’ll point out two important ones:

Firstly,

Make sure to never initialize vectors like this. These create a Vector{Any}, which means that the compiler has no way of knowing what kind of values are contained therein, and therefore cannot create fast specialized instructions.

Secondly,

Here (and other places, too) you painstakingly build a vector, but in the end you throw it away, just keeping the maximum. Instead just keep the running max:

function dQ(coordenadas)
    n = size(coordenadas,1)
    dmax = 0.0
    for i in 1:n 
        for j in i+1:n
             # faster: d = @views norm(coordenadas[:,i]-coordenadas[:,j])
            d = norm(coordenadas[i,:]-coordenadas[j,:]) 
            dmax = max(dmax, d)
        end
    end
    return dmax
end

Always make sure to not create arrays if you don’t need them.

There’s a lot of other things as well, but those two struck me particularly.

BTW, if you want help to really speed this up, please provide example input data, so that posters can run the code themselves. Right now it’s not easy to guess what sort of input data should be used.

I’m sorry, I didn’t think it through

Can you show how you call H? It takes three different input arguments, how does s relate to those?

Ahhh the second argument could be ss =-1*s and n is an Int that helps us increase the precision of the method, the function H is defined below the other functions above.

But thank you very much, the two observations you made previously have made me see those errors.