Help with optimization

I have the following function that is going to be called hundreds of thousands of times, and right now it’s running way too slow.

function sample_σ!(metrop::LJ_metrop,data::DataSet,LJ::LJ,LJ_next::LJ,std)

    for j = 1:LJ.order, k = j:LJ.order
        #println(LJ.σ,metrop.σ_candSigs)
        cand = rand(metrop.proposal(LJ.σ[j,k],metrop.σ_candSigs[j,k]))
        if cand < 0.05
            continue
        end 

        LJ_next.σ[j,k] = cand  # Set sigma equal to the candidate draw

        # Evaluate the posterior at the candidate draw.
        numerator = metrop.logpost(data,LJ_next,std) + log(pdf(metrop.proposal(cand,metrop.σ_candSigs[j,k]),LJ.σ[j,k]))
        LJ_next.σ[j,k] = LJ.σ[j,k]  # Set sigma back to the previous draw.
    
        # Evaluate the posterior again.
        denominator =metrop.logpost(data,LJ_next,std)  + log(pdf(metrop.proposal(LJ.σ[j,k],metrop.σ_candSigs[j,k]),cand))
        r = numerator - denominator
        unif = log(rand(Uniform(0,1)))  # Draw from a uniform.
        if r >= 0 || ((r < 0) & (unif < r))  # Accept?
            LJ_next.σ[j,k] = cand
    #        metrop.params_draws[i,j,k,l] = cand   # Yes!
            metrop.σ_accept[j,k] += 1/metrop.nDraws
        end

    end
end

LJ.order = 2 so the nested loop inside the function is only running 3 iterations.

Here are the relevant user-defined types:

struct LJ_metrop
    nDraws:: Int64
    nBurnIn:: Int64

    # Acceptance rates for all of the parameters
    ϵ_accept:: UpperTriangular{Float64, Matrix{Float64}}
    σ_accept:: UpperTriangular{Float64, Matrix{Float64}}
    std_accept:: Vector{Float64}
    
    # Prior distributions
    ϵ_Priors:: Array{Distribution,2}#Vector{Distribution}
    σ_Priors:: Array{Distribution,2}
    std_Prior:: Distribution

    # Sigmas on proposal distributions
    ϵ_candSigs:: UpperTriangular{Float64, Matrix{Float64}}
    σ_candSigs:: UpperTriangular{Float64, Matrix{Float64}}
    std_candSig:: Float64

    # Initial guess
    std:: Float64


    # Proposal distribution
    proposal:: Function

    # Posterior
    logpost:: Function
end

struct LJ <: energyModel
    order:: Int
  #  params:: Array{Float64,3}
    cutoff:: Float64

    # Replace params above.
    σ:: UpperTriangular{Float64, Matrix{Float64}}
    ϵ:: UpperTriangular{Float64, Matrix{Float64}}

end

mutable struct Crystal
    title::String
    latpar::Float64
    lVecs:: Matrix{Float64}
    nType:: Vector{Int64} #Number of each type of atom 
    aType:: Vector{Int64} # Integers representing the type for each basis atom
    nAtoms::Int64  # Total number of atoms in the unit cell
    coordSys::Vector{String} # 'D' for Direct or 'C' for cartesian
    atomicBasis:: Vector{Vector{Vector{Float64}}}  # List of all the atomic basis vector separated by type 
    species:: Vector{String}  # Species of the atoms 
    energyFP:: Float64  # First principles energy
    modelEnergy:: Float64 # Model energy
    order::Int64 # binary, ternary, etc.
    r6::UpperTriangular{Float64, Matrix{Float64}}
    r12::UpperTriangular{Float64, Matrix{Float64}}
end


and here are the two functions that are being called:

proposal(μ,σ) = Gamma(μ^2/σ^2,σ^2/μ)
logPost(data,model,σ) = MatSim.logNormal(data,model,σ) + sum(logpdf.(σ_Priors,model.σ)) + sum(logpdf.(ϵ_Priors,model.ϵ)) + logpdf(std_Prior,σ)

I’m not asking you to spend a lot of time optimizing my code. Just wondering if anyone can see any glaring problems that would help out. Also, just trying to learn the general tips/tricks for optimizing julia. I still don’t have good intuition/ habits for optimizing in julia so trying to build that.

Right now when I call this function 50,000 times, it’s allocating a little over 3.0 G and taking over a minute to execute. Feels like I could do better. Thanks in advance

These are abstract types. Probably is good to parameterize them (struct A{F} f::F end).

Except for this, which might or not affect your performance, there’s nothing obvious in terms of coding style.

ps: is Distribution concrete?

Not sure. How do you find out?

Are you saying to make another user-defined type for logpost and proposal or just parameterize LJ_metrop?

This one


struct LJ_metrop{T}
    proposal::T
    logpost::T

end

???

Then I call the constructor like this:

LJ_metrop{Function}(LP,prop)

No, each function has its own type, then:

struct LJ_metrop{F<:Function,G<:Function}
    proposal::F
    logpost::G

end

and you can call it as before. You might want also to parameterize the Distribution field, if it can be different types of distributions (if it is coming from the Distributions package, it is abstract):

julia> isconcretetype(Distribution)
false

Then, instead of using Distribution inside the struct definition, use:

struct LJ_metrop{F<:Function, G<:Function, D<:Distribution}
    proposal::F
    logpost::G
    ϵ_Priors:: Matrix{D}#Vector{Distribution}
    σ_Priors:: Matrix{D}
    std_Prior:: D

(if all the field expected the same distribution, otherwise use different parameters for each).

The <: syntax means that I don’t have to specify the type when calling the constructor? Also, why use both F and G rather than having both the function be of type F say?

By parameterizing the type, I’m essentially forcing the caller to specify the type of the fields which makes the user-defined type concrete even if the field types are abstract… Did I get that right?

You are saying that the field is one specific concrete subtype of the abstract type Function.

The <:Function is just to throw an error if the user tries to initialize the structure with something else:

julia> struct A{F}
           f::F
       end

julia> A(1.0)
A{Float64}(1.0)

julia> struct B{F<:Function}
           f::F
       end

julia> B(1.0)
ERROR: MethodError: no method matching B(::Float64)

Closest candidates are:
  B(::F) where F<:Function
   @ Main REPL[7]:2

Stacktrace:
 [1] top-level scope
   @ REPL[8]:1

julia> B(sin)
B{typeof(sin)}(sin)

Don’t think UpperTriangular is concrete either. This is going to increase the length of the constructor call with 4 types to specify…

This seems concrete (it is one specific subtype of UpperTriangular with those parametric types.

:face_exhaling:
It didn’t speed things up but that was good information… Thank you… One more question. I tried parameterizing the functions, but not the distribution, like this:

struct LJ_metrop{F<:Function, G<:Function}
    proposal::F
    logpost::G
    std_Prior::Distribution
end

Then I checked to see if the type was concrete:

isconcretetype(LJ_metrop{Function,Function})

and it returned true???

It is a concrete type with abstract fields.

From the aspect of the code you should probably benefit from making everything concrete.

Don’t you have MWE?

Not up with your acronym. MWE?

You can probably improve these using

sum(p -> logpdf(p,model.σ), σ_Priors)

which will avoid the (I think) creation of the intermediate array.

Minimal working example.

Something others can run to test.

Minimal working example. Got it… The code is pretty long and complex, but I’ve shown you all the critical parts. There are several input/data files to read in so I don’t know if I could provide something that you could run.

It’s on github but I haven’t added the changes that I’ve discussed here. (git@github.com:lancejnelson/MatSim.jl.git)

If this function is the one to be optimized, it will be useful to have a set of inputs, even if with random data, that represents a single run. Then other people might jump in and take the last bit of performance from your code.

1 Like