Code Organization

lancejnelson · March 29, 2024, 10:57pm

I’m having a really hard time finding a code structure that works for me. I’m building a pretty big library of interatomic potentials. I’ve broken the code up into several files:

One for each of the following sets of tasks:

Reading in crystal structure information and building associated datasets
Processing calculation files and extracting information.
Functions associated with Metropolis-Hastings algorithm and other machine learning stuff.
A Lennard Jones library (I know, simplistic… for now) that calculates energies and forces.
A Stillinger-Weber library that also calculates energies and forces.
A nested-sampling library for performing thermodynamic simulations.

Most of these separate files are just files of functions, not modules. But the LennardJones file and the Stillinger Weber files need to be modules because they both have functions named “totalEnergy” and “getForce” so I need them to be in separate namespaces. (i.e. I’d like to be able to do LJ.totalEnergy() and SW.totalEnergy()).

So I’ve built a parent module that looks like this:

module matsim
using StaticArrays
using Printf
using LinearAlgebra
using StatsBase
using Distributions

include("types/data_types.jl")  # Load abstract data types.
include("libraries/Crystal.jl")  # Load methods used to load and manipulate crystals.
include("libraries/dataset.jl")  # Load methods used to load and manipulate entire data sets.
include("libraries/vaspUtils.jl")  # Load methods used to interface with vasp
include("libraries/metrop.jl")  # Load metropolis-hastings sampling files.
include("modules/LennardJones.jl") # Load Lennard Jones processing files.
include("modules/StillingerWeber.jl") # Load Stillinger-Weber processing files.
include("libraries/nestedSampling.jl")  # Load nested sampling code
end

And then I load this module from the outer script. The problem I’m facing is that functions inside the LennardJones module use abstracttype “Crystal”. But since the functions inside LennardJones are in a separate namespace it doesn’t recognize that type (which was imported in the outer scope). In other words, there is a function inside of LennardJones like this

function totalEnergy(crystal::Crystal,params::Array{Float64,2})
    energy = sum(-params[:,1] .* params[:,2] .^6 .* crystal.ljvals[:,1] + params[:,1] .* params[:,2] .^12 .* crystal.ljvals[:,2] )
    return energy
end

and it doesn’t recognize the function type because it wants to use the “Crystal” type from within the LJ scope, not the outer scope from which it was originally imported.

I know this isn’t a working example and I’m happy to include more code. I’m just having a hard time building a mental framework for organizing big code in Julia… Any help is appreciated.

ufechner7 · March 29, 2024, 11:11pm

Always a good idea to create packages if you have a large code base. Did you think about doing that?

lancejnelson · March 29, 2024, 11:12pm

How will that solve my problem here?

awasserman · March 29, 2024, 11:14pm

You can import from the parent module into the child module.

For example, say we have files “Foo.jl” and “bar.jl” with the following content

Foo.jl:

module Foo

abstract type MyAbstractType end

function show_values(x::MyAbstractType)
    for property in propertynames(x)
        @show getproperty(x, property)
    end
end

include("bar.jl")
using .Bar

export Bar
export show_values

end # module Foo

bar.jl:

module Bar

using ..Foo: MyAbstractType

struct MyBarType <: MyAbstractType
    a::String
    b::Int
end

end

ufechner7 · March 29, 2024, 11:14pm

If you need an abstract type for interfacing different packages, you can have a separate package that just defines and exports this abtract type… It can then be implemented by different other packages…

This is my package hierarchy:
kite_power_tools

kevbonham · March 29, 2024, 11:15pm

It can be quite comfortable to put everything in one module / project, but I’ve found in my own work that I’m usually not splitting enough.

That is, make one module / package that holds any common types that are needed, make another (separate project, separate src directory etc), that deals with data loading, another for your data extraction and analysis etc.

You can even more-or-less keep them in a master project where you dev --local the other packages, but keeping separate packages, tests suits etc forces you to think in more atomic units, which in turn helps you to be explicit about organization and helps keep things straight.

At least, that’s been my experience.

lancejnelson · March 29, 2024, 11:24pm

But a function will always look for types that are inside it’s parent module right… So if I have the following module

module LJ

function totalEnergy(crystal::Crystal,params::Array{Float64,2})
    energy = sum(-params[:,1] .* params[:,2] .^6 .* crystal.ljvals[:,1] + params[:,1] .* params[:,2] .^12 .* crystal.ljvals[:,2] )
    return energy
end


end

and another module that defines the type:

module types

mutable struct Crystal
    title::String
    latpar::Float64
    lVecs:: Matrix{Float64}
    nType:: Vector{Int64} #Number of each type of atom 
    aType:: Vector{Int64} # Integers representing the type for each basis atom
    nAtoms::Int64  # Total number of atoms in the unit cell
    coordSys::Vector{String} # 'D' for Direct or 'C' for cartesian
    atomicBasis:: Vector{Vector{Vector{Float64}}}  # List of all the atomic basis vector separated by type 
    species:: Vector{String}  # Species of the atoms 
    energyFP:: Float64  # First principles energy
    energyPred:: Float64 # Model energy
    order::Int64 # binary, ternary, etc.
    ljvals:: Matrix{Float64}
end

end

And I load both of these modules in the outer script, my call to totalEnergy will fail because it’s looking for a type named “Crystal” that is inside it’s local module. What am I missing?

mkitti · March 29, 2024, 11:31pm

If you made test into a package or at least a submodule of a package, then you could do the following.

module LJ

using test: Crystal

function totalEnergy(crystal::Crystal,params::Array{Float64,2})
    energy = sum(-params[:,1] .* params[:,2] .^6 .* crystal.ljvals[:,1] + params[:,1] .* params[:,2] .^12 .* crystal.ljvals[:,2] )
    return energy
end


end

The main advantage of using packages is easing resolution of the name test. I would recommend choosing a different name.

awasserman · March 29, 2024, 11:31pm

Something like this should at least work:

module Foo

module Types

mutable struct Crystal
    title::String
end

end # module Types

module LJ

using ..Types: Crystal

function totalEnergy(crystal::Crystal)
    @show crystal.title
end

end # module LJ

end # module Foo

Then assuming you’ve imported module Foo, the following should work:

c = Foo.Types.Crystal("some text")
Foo.LJ.totalEnergy(c)

lancejnelson · March 29, 2024, 11:38pm

I think that might be what I’m missing. How would I access Crystal if the type definition was not in a module; just defined in the parent scope?

ufechner7 · March 29, 2024, 11:43pm

Why not putting type definitions in a module?

awasserman · March 29, 2024, 11:44pm

You probably will want to have all of your type and function definitions in some sort of module, as opposed to in a script or inputting into the REPL.

lancejnelson · March 29, 2024, 11:45pm

I’ve heard so many opinions about when its appropriate to define a new module. I’ve gone back and forth myself and just trying to settle on one paradigm so I can move forward. What is your mental checklist for deciding when to make a module?

awasserman · March 29, 2024, 11:57pm

Anytime I want to re-use code, I’ll put it into a module.

For self-contained or one-off scripts, I typically won’t make a module.

Just to overload with some more opinions (take with a grain of salt ):

You may want to consider leveraging subtypes and multiple dispatch to organize some of this code.

For example:

module Foo

export Crystal, LJPotential, total_energy

abstract type AbstractPotential end

struct Crystal{T<:Real}
    energy::T
end

"""
    total_energy(potential, crystal)

Compute the total energy of the crystal under the potential blah blah...
"""
total_energy(potential::AbstractPotential, crystal::Crystal) = error("need to define for concrete potential")

struct LJPotential{T_epsilon<:Real, T_sigma<:Real} <: AbstractPotential
    epsilon::T_epsilon
    sigma::T_sigma
end

function total_energy(potential::LJPotential, crystal::Crystal)
    return potential.epsilon * crystal.energy
end

end # module Foo

If you know that in each “Potential” module, you would have a function called “total_energy” that has the same interface, then you could instead make each potential a different type, with any parameters of that potential as the fields of the type.

This makes things easier to use when want to compare things between different potentials, because you’re now passing around types rather than modules (namespaces).

Just a thought of course; I haven’t deeply considered what exactly you’re trying to do here.

lancejnelson · March 29, 2024, 11:59pm

Thanks for the ideas. Much appreciated.

ufechner7 · March 30, 2024, 5:56am

Tooling support for modules is not-so-great, though, that’s why I suggested packages for code you want to use in different projects…

mkitti · April 4, 2024, 11:15pm

I did not this see this part addressed earlier or elsewhere. Using modules in this way for two functions named totalEnergy is not the canonical approach for this kind of problem in Julia. Idiomatically, we would take advantage of multiple dispatch.

I would suggest the following pattern:

abstract type AbstractPotential end
struct LenardJonesPotential <: AbstractPotential end
struct StillingerWeberPotential <: AbstractPotential end
struct Crystal end

const LJ = LenardJonesPotential()
const SW = StillingerWeberPotential()
export LJ, SW

function totalEnergy(::LenardJonesPotential, c::Crystal)
    1 # place holder
    # LenardJones.totalEnergy(c)
end
function totalEnergy(::StillingerWeberPotential, c::Crystal)
    2 # place holder
    # StillingerWeber.totalEnergy(c)
end

You could then use them as follows where there would be a single totalEnergy function with two methods.

julia> totalEnergy(LJ, Crystal())
1

julia> totalEnergy(SW, Crystal())
2

julia> methods(totalEnergy)
# 2 methods for generic function "totalEnergy" from Main:
 [1] totalEnergy(::StillingerWeberPotential, c::Crystal)
     @ REPL[35]:1
 [2] totalEnergy(::LenardJonesPotential, c::Crystal)
     @ REPL[34]:1

lancejnelson · April 4, 2024, 11:23pm

I like it… although I don’t see the need for the abstract type AbstractPotential. Wouldn’t everything you’ve done here work just fine without it?

mkitti · April 4, 2024, 11:33pm

Yes. There are a few things you could do with this though. For example, you could define a fallback or use this to list available potentials.

julia> struct MorsePotential <: AbstractPotential end

julia> listPotentialModels() = subtypes(AbstractPotential)
listPotentialModels (generic function with 1 method)

julia> listPotentialModels()
3-element Vector{Any}:
 LenardJonesPotential
 MorsePotential
 StillingerWeberPotential

julia> totalEnergy(p::P, c::Crystal) where P <: AbstractPotential =
           error("""totalEnergy is unimplemented for $P.
                    Please implement totalEnergy(::$P, ::Crystal)""")
totalEnergy (generic function with 3 methods)

julia> totalEnergy(MorsePotential(), Crystal())
ERROR: totalEnergy is unimplemented for MorsePotential.
Please implement totalEnergy(::MorsePotential, ::Crystal)

lancejnelson · April 4, 2024, 11:41pm

I see. So here you’re really defining three functions in one by using the abstract type. But you then have to build the function to work for any of the given potentials right?