using Distributions
order = 2
σ_Priors = Array{Distribution,2}(undef,order,order)
ϵ_Priors = Array{Distribution,2}(undef,order,order)
σ_Priors[:,:] .= Gamma(10,0.1)
ϵ_Priors[:,:] .= Gamma(10,0.1)
std_Prior = Gamma(10,0.3)
function totalEnergy(crystal::Crystal,LJ::LJ)
totalEnergy = 0
for i in eachindex(LJ.ϵ)
totalEnergy += -LJ.ϵ[i] * LJ.σ[i] ^6 * crystal.r6[i] + LJ.ϵ[i] * LJ.σ[i] .^12 * crystal.r12[i]
end
# @time "total Energy" totalEnergy = sum(-LJ.ϵ .* LJ.σ .^6 .* crystal.r6 .+ LJ.ϵ .* LJ.σ .^12 .* crystal.r12 )
end
function logNormal(data::MatSim.DataSet,LJ::MatSim.LJ,σ::Float64)
thesum = 0
for i = 1:data.nData
thesum += (data.crystals[i].energyFP -MatSim.totalEnergy(data.crystals[i],LJ))^2
end
thesum *= -data.nData/2 *log(σ^2) - 1/(2 * σ^2)
# @time "thesum" thesum = -data.nData/2 *log(σ^2) - 1/(2 * σ^2) * sum([(i.energyFP - totalEnergy(i,LJ))^2 for i in data.crystals])
return thesum
end
function logPost(data,model,σ)
# return 3.0
thesum = logNormal(data,model,σ) + logpdf(std_Prior,σ)
for i in eachindex(model.σ)
thesum += logpdf(σ_Priors[i],model.σ[i])
thesum += logpdf(ϵ_Priors[i],model.ϵ[i])
end
return thesum
end
@time logPost(trainingSet,LJ,0.3)
I recognize that I’m not giving you all of the code but I’m hoping it’s enough to go on. Currently a single call to logPost requires 34 allocations. Doesn’t seem too bad except that this function is being called hundreds of thousands of times. I’m convinced that this function is the performance bottleneck. Can anyone see any glaring problems here? Perhaps using globals for σ_Priors instead of passing them into the function?
is problematic. It turns out that not all of the distributions are identical. They may all be Gammas, but with different parameters. How could I avoid this problem in this case?
Does the gamma distribution use different types for different parameters? If they are all the same type, you could just use typeof(Gamma(10, 0.1)) for example. If you have a few different distribution types, wrapping them in a Union type could be fine. Alternatively, use LightSumTypes.jl as mentioned above
I don’t think so. Where that UpperTriangular type is coming from?
The main reason is probably the abstract Distribution type. If you only need distributions of type Gamma as elements of of the priors (or other of the same type with varying parameters), you should parameterize the type.
Can I ask one more kinda related question and then I’ll be quiet.
One “type” of array is Array{Float64,2}, meaning a 2 dimensional array filled with floats. Are users allowed to define types with integers in their type definition? If so, what does that look like?
struct myType{T,N}
x::T
...?
For best performance, user-defined types should be concrete when instantiated right? (Not sure I’m even asking this question right. I’m just wanting to build intuition for how to use types correctly and most efficiently. ). For example, I notice that isconcretetype(Matrix{Distribution}) is true, but I wouldn’t have thought it was concrete because Distribution isn’t concrete. I guess I’m asking when is a composite type deemed concrete and is that something that should be strived for.
struct ModelA{F,T}
σ::Vector{F}
ϵ::Vector{F}
σ_Priors::Matrix{T}
ϵ_Priors::Matrix{T}
end
I understand that parameterizing with a concrete doesn’t create a family of types because you’re at the bottom of the type tree, but is there no performance gain by doing this?