Help with optimization

lancejnelson · September 28, 2024, 10:50pm

I have the following function that is going to be called hundreds of thousands of times, and right now it’s running way too slow.

function sample_σ!(metrop::LJ_metrop,data::DataSet,LJ::LJ,LJ_next::LJ,std)

    for j = 1:LJ.order, k = j:LJ.order
        #println(LJ.σ,metrop.σ_candSigs)
        cand = rand(metrop.proposal(LJ.σ[j,k],metrop.σ_candSigs[j,k]))
        if cand < 0.05
            continue
        end 

        LJ_next.σ[j,k] = cand  # Set sigma equal to the candidate draw

        # Evaluate the posterior at the candidate draw.
        numerator = metrop.logpost(data,LJ_next,std) + log(pdf(metrop.proposal(cand,metrop.σ_candSigs[j,k]),LJ.σ[j,k]))
        LJ_next.σ[j,k] = LJ.σ[j,k]  # Set sigma back to the previous draw.
    
        # Evaluate the posterior again.
        denominator =metrop.logpost(data,LJ_next,std)  + log(pdf(metrop.proposal(LJ.σ[j,k],metrop.σ_candSigs[j,k]),cand))
        r = numerator - denominator
        unif = log(rand(Uniform(0,1)))  # Draw from a uniform.
        if r >= 0 || ((r < 0) & (unif < r))  # Accept?
            LJ_next.σ[j,k] = cand
    #        metrop.params_draws[i,j,k,l] = cand   # Yes!
            metrop.σ_accept[j,k] += 1/metrop.nDraws
        end

    end
end

LJ.order = 2 so the nested loop inside the function is only running 3 iterations.

Here are the relevant user-defined types:

struct LJ_metrop
    nDraws:: Int64
    nBurnIn:: Int64

    # Acceptance rates for all of the parameters
    ϵ_accept:: UpperTriangular{Float64, Matrix{Float64}}
    σ_accept:: UpperTriangular{Float64, Matrix{Float64}}
    std_accept:: Vector{Float64}
    
    # Prior distributions
    ϵ_Priors:: Array{Distribution,2}#Vector{Distribution}
    σ_Priors:: Array{Distribution,2}
    std_Prior:: Distribution

    # Sigmas on proposal distributions
    ϵ_candSigs:: UpperTriangular{Float64, Matrix{Float64}}
    σ_candSigs:: UpperTriangular{Float64, Matrix{Float64}}
    std_candSig:: Float64

    # Initial guess
    std:: Float64


    # Proposal distribution
    proposal:: Function

    # Posterior
    logpost:: Function
end

struct LJ <: energyModel
    order:: Int
  #  params:: Array{Float64,3}
    cutoff:: Float64

    # Replace params above.
    σ:: UpperTriangular{Float64, Matrix{Float64}}
    ϵ:: UpperTriangular{Float64, Matrix{Float64}}

end

mutable struct Crystal
    title::String
    latpar::Float64
    lVecs:: Matrix{Float64}
    nType:: Vector{Int64} #Number of each type of atom 
    aType:: Vector{Int64} # Integers representing the type for each basis atom
    nAtoms::Int64  # Total number of atoms in the unit cell
    coordSys::Vector{String} # 'D' for Direct or 'C' for cartesian
    atomicBasis:: Vector{Vector{Vector{Float64}}}  # List of all the atomic basis vector separated by type 
    species:: Vector{String}  # Species of the atoms 
    energyFP:: Float64  # First principles energy
    modelEnergy:: Float64 # Model energy
    order::Int64 # binary, ternary, etc.
    r6::UpperTriangular{Float64, Matrix{Float64}}
    r12::UpperTriangular{Float64, Matrix{Float64}}
end

and here are the two functions that are being called:

proposal(μ,σ) = Gamma(μ^2/σ^2,σ^2/μ)
logPost(data,model,σ) = MatSim.logNormal(data,model,σ) + sum(logpdf.(σ_Priors,model.σ)) + sum(logpdf.(ϵ_Priors,model.ϵ)) + logpdf(std_Prior,σ)

I’m not asking you to spend a lot of time optimizing my code. Just wondering if anyone can see any glaring problems that would help out. Also, just trying to learn the general tips/tricks for optimizing julia. I still don’t have good intuition/ habits for optimizing in julia so trying to build that.

Right now when I call this function 50,000 times, it’s allocating a little over 3.0 G and taking over a minute to execute. Feels like I could do better. Thanks in advance

lmiq · September 28, 2024, 10:54pm

These are abstract types. Probably is good to parameterize them (struct A{F} f::F end).

Except for this, which might or not affect your performance, there’s nothing obvious in terms of coding style.

ps: is Distribution concrete?

lancejnelson · September 28, 2024, 11:00pm

Not sure. How do you find out?

Are you saying to make another user-defined type for logpost and proposal or just parameterize LJ_metrop?

lmiq · September 28, 2024, 11:04pm

This one

lancejnelson · September 28, 2024, 11:09pm


struct LJ_metrop{T}
    proposal::T
    logpost::T

end

???

Then I call the constructor like this:

LJ_metrop{Function}(LP,prop)

lmiq · September 28, 2024, 11:14pm

No, each function has its own type, then:

struct LJ_metrop{F<:Function,G<:Function}
    proposal::F
    logpost::G

end

and you can call it as before. You might want also to parameterize the Distribution field, if it can be different types of distributions (if it is coming from the Distributions package, it is abstract):

julia> isconcretetype(Distribution)
false

Then, instead of using Distribution inside the struct definition, use:

struct LJ_metrop{F<:Function, G<:Function, D<:Distribution}
    proposal::F
    logpost::G
    ϵ_Priors:: Matrix{D}#Vector{Distribution}
    σ_Priors:: Matrix{D}
    std_Prior:: D

(if all the field expected the same distribution, otherwise use different parameters for each).

lancejnelson · September 28, 2024, 11:16pm

The <: syntax means that I don’t have to specify the type when calling the constructor? Also, why use both F and G rather than having both the function be of type F say?

lancejnelson · September 28, 2024, 11:21pm

By parameterizing the type, I’m essentially forcing the caller to specify the type of the fields which makes the user-defined type concrete even if the field types are abstract… Did I get that right?

lmiq · September 28, 2024, 11:26pm

You are saying that the field is one specific concrete subtype of the abstract type Function.

The <:Function is just to throw an error if the user tries to initialize the structure with something else:

julia> struct A{F}
           f::F
       end

julia> A(1.0)
A{Float64}(1.0)

julia> struct B{F<:Function}
           f::F
       end

julia> B(1.0)
ERROR: MethodError: no method matching B(::Float64)

Closest candidates are:
  B(::F) where F<:Function
   @ Main REPL[7]:2

Stacktrace:
 [1] top-level scope
   @ REPL[8]:1

julia> B(sin)
B{typeof(sin)}(sin)

lancejnelson · September 28, 2024, 11:27pm

Don’t think UpperTriangular is concrete either. This is going to increase the length of the constructor call with 4 types to specify…

lmiq · September 28, 2024, 11:29pm

This seems concrete (it is one specific subtype of UpperTriangular with those parametric types.

lancejnelson · September 28, 2024, 11:33pm

It didn’t speed things up but that was good information… Thank you… One more question. I tried parameterizing the functions, but not the distribution, like this:

struct LJ_metrop{F<:Function, G<:Function}
    proposal::F
    logpost::G
    std_Prior::Distribution
end

Then I checked to see if the type was concrete:

isconcretetype(LJ_metrop{Function,Function})

and it returned true???

lmiq · September 28, 2024, 11:39pm

It is a concrete type with abstract fields.

From the aspect of the code you should probably benefit from making everything concrete.

Don’t you have MWE?

lancejnelson · September 28, 2024, 11:40pm

Not up with your acronym. MWE?

lmiq · September 28, 2024, 11:43pm

You can probably improve these using

sum(p -> logpdf(p,model.σ), σ_Priors)

which will avoid the (I think) creation of the intermediate array.

lmiq · September 28, 2024, 11:43pm

Minimal working example.

Something others can run to test.

lancejnelson · September 28, 2024, 11:44pm

Minimal working example. Got it… The code is pretty long and complex, but I’ve shown you all the critical parts. There are several input/data files to read in so I don’t know if I could provide something that you could run.

lancejnelson · September 28, 2024, 11:46pm

It’s on github but I haven’t added the changes that I’ve discussed here. (git@github.com:lancejnelson/MatSim.jl.git)

lmiq · September 28, 2024, 11:47pm

If this function is the one to be optimized, it will be useful to have a set of inputs, even if with random data, that represents a single run. Then other people might jump in and take the last bit of performance from your code.

Topic		Replies	Views
Need help on improving performance of a small subfunction Performance	21	870	May 4, 2020
Improving Performance of a Loop Performance	9	416	November 3, 2021
Suggestions needed: speed up optimization Performance	31	1970	October 4, 2020
Performance of Computation of Expectation of slow function Optimization (Mathematical) jump , performance , optimization	13	1009	January 30, 2022
Nested Loop optimization Performance	2	418	September 20, 2022

Help with optimization

Related topics