I’m having a really hard time finding a code structure that works for me. I’m building a pretty big library of interatomic potentials. I’ve broken the code up into several files:
One for each of the following sets of tasks:
Reading in crystal structure information and building associated datasets
Processing calculation files and extracting information.
Functions associated with Metropolis-Hastings algorithm and other machine learning stuff.
A Lennard Jones library (I know, simplistic… for now) that calculates energies and forces.
A Stillinger-Weber library that also calculates energies and forces.
A nested-sampling library for performing thermodynamic simulations.
Most of these separate files are just files of functions, not modules. But the LennardJones file and the Stillinger Weber files need to be modules because they both have functions named “totalEnergy” and “getForce” so I need them to be in separate namespaces. (i.e. I’d like to be able to do LJ.totalEnergy() and SW.totalEnergy()).
So I’ve built a parent module that looks like this:
module matsim
using StaticArrays
using Printf
using LinearAlgebra
using StatsBase
using Distributions
include("types/data_types.jl") # Load abstract data types.
include("libraries/Crystal.jl") # Load methods used to load and manipulate crystals.
include("libraries/dataset.jl") # Load methods used to load and manipulate entire data sets.
include("libraries/vaspUtils.jl") # Load methods used to interface with vasp
include("libraries/metrop.jl") # Load metropolis-hastings sampling files.
include("modules/LennardJones.jl") # Load Lennard Jones processing files.
include("modules/StillingerWeber.jl") # Load Stillinger-Weber processing files.
include("libraries/nestedSampling.jl") # Load nested sampling code
end
And then I load this module from the outer script. The problem I’m facing is that functions inside the LennardJones module use abstracttype “Crystal”. But since the functions inside LennardJones are in a separate namespace it doesn’t recognize that type (which was imported in the outer scope). In other words, there is a function inside of LennardJones like this
function totalEnergy(crystal::Crystal,params::Array{Float64,2})
energy = sum(-params[:,1] .* params[:,2] .^6 .* crystal.ljvals[:,1] + params[:,1] .* params[:,2] .^12 .* crystal.ljvals[:,2] )
return energy
end
and it doesn’t recognize the function type because it wants to use the “Crystal” type from within the LJ scope, not the outer scope from which it was originally imported.
I know this isn’t a working example and I’m happy to include more code. I’m just having a hard time building a mental framework for organizing big code in Julia… Any help is appreciated.
You can import from the parent module into the child module.
For example, say we have files “Foo.jl” and “bar.jl” with the following content
Foo.jl:
module Foo
abstract type MyAbstractType end
function show_values(x::MyAbstractType)
for property in propertynames(x)
@show getproperty(x, property)
end
end
include("bar.jl")
using .Bar
export Bar
export show_values
end # module Foo
bar.jl:
module Bar
using ..Foo: MyAbstractType
struct MyBarType <: MyAbstractType
a::String
b::Int
end
end
If you need an abstract type for interfacing different packages, you can have a separate package that just defines and exports this abtract type… It can then be implemented by different other packages…
It can be quite comfortable to put everything in one module / project, but I’ve found in my own work that I’m usually not splitting enough.
That is, make one module / package that holds any common types that are needed, make another (separate project, separate src directory etc), that deals with data loading, another for your data extraction and analysis etc.
You can even more-or-less keep them in a master project where you dev --local the other packages, but keeping separate packages, tests suits etc forces you to think in more atomic units, which in turn helps you to be explicit about organization and helps keep things straight.
But a function will always look for types that are inside it’s parent module right… So if I have the following module
module LJ
function totalEnergy(crystal::Crystal,params::Array{Float64,2})
energy = sum(-params[:,1] .* params[:,2] .^6 .* crystal.ljvals[:,1] + params[:,1] .* params[:,2] .^12 .* crystal.ljvals[:,2] )
return energy
end
end
and another module that defines the type:
module types
mutable struct Crystal
title::String
latpar::Float64
lVecs:: Matrix{Float64}
nType:: Vector{Int64} #Number of each type of atom
aType:: Vector{Int64} # Integers representing the type for each basis atom
nAtoms::Int64 # Total number of atoms in the unit cell
coordSys::Vector{String} # 'D' for Direct or 'C' for cartesian
atomicBasis:: Vector{Vector{Vector{Float64}}} # List of all the atomic basis vector separated by type
species:: Vector{String} # Species of the atoms
energyFP:: Float64 # First principles energy
energyPred:: Float64 # Model energy
order::Int64 # binary, ternary, etc.
ljvals:: Matrix{Float64}
end
end
And I load both of these modules in the outer script, my call to totalEnergy will fail because it’s looking for a type named “Crystal” that is inside it’s local module. What am I missing?
If you made test into a package or at least a submodule of a package, then you could do the following.
module LJ
using test: Crystal
function totalEnergy(crystal::Crystal,params::Array{Float64,2})
energy = sum(-params[:,1] .* params[:,2] .^6 .* crystal.ljvals[:,1] + params[:,1] .* params[:,2] .^12 .* crystal.ljvals[:,2] )
return energy
end
end
The main advantage of using packages is easing resolution of the name test. I would recommend choosing a different name.
module Foo
module Types
mutable struct Crystal
title::String
end
end # module Types
module LJ
using ..Types: Crystal
function totalEnergy(crystal::Crystal)
@show crystal.title
end
end # module LJ
end # module Foo
Then assuming you’ve imported module Foo, the following should work:
c = Foo.Types.Crystal("some text")
Foo.LJ.totalEnergy(c)
I’ve heard so many opinions about when its appropriate to define a new module. I’ve gone back and forth myself and just trying to settle on one paradigm so I can move forward. What is your mental checklist for deciding when to make a module?
Anytime I want to re-use code, I’ll put it into a module.
For self-contained or one-off scripts, I typically won’t make a module.
Just to overload with some more opinions (take with a grain of salt ):
You may want to consider leveraging subtypes and multiple dispatch to organize some of this code.
For example:
module Foo
export Crystal, LJPotential, total_energy
abstract type AbstractPotential end
struct Crystal{T<:Real}
energy::T
end
"""
total_energy(potential, crystal)
Compute the total energy of the crystal under the potential blah blah...
"""
total_energy(potential::AbstractPotential, crystal::Crystal) = error("need to define for concrete potential")
struct LJPotential{T_epsilon<:Real, T_sigma<:Real} <: AbstractPotential
epsilon::T_epsilon
sigma::T_sigma
end
function total_energy(potential::LJPotential, crystal::Crystal)
return potential.epsilon * crystal.energy
end
end # module Foo
If you know that in each “Potential” module, you would have a function called “total_energy” that has the same interface, then you could instead make each potential a different type, with any parameters of that potential as the fields of the type.
This makes things easier to use when want to compare things between different potentials, because you’re now passing around types rather than modules (namespaces).
Just a thought of course; I haven’t deeply considered what exactly you’re trying to do here.
I did not this see this part addressed earlier or elsewhere. Using modules in this way for two functions named totalEnergy is not the canonical approach for this kind of problem in Julia. Idiomatically, we would take advantage of multiple dispatch.
I would suggest the following pattern:
abstract type AbstractPotential end
struct LenardJonesPotential <: AbstractPotential end
struct StillingerWeberPotential <: AbstractPotential end
struct Crystal end
const LJ = LenardJonesPotential()
const SW = StillingerWeberPotential()
export LJ, SW
function totalEnergy(::LenardJonesPotential, c::Crystal)
1 # place holder
# LenardJones.totalEnergy(c)
end
function totalEnergy(::StillingerWeberPotential, c::Crystal)
2 # place holder
# StillingerWeber.totalEnergy(c)
end
You could then use them as follows where there would be a single totalEnergy function with two methods.
julia> totalEnergy(LJ, Crystal())
1
julia> totalEnergy(SW, Crystal())
2
julia> methods(totalEnergy)
# 2 methods for generic function "totalEnergy" from Main:
[1] totalEnergy(::StillingerWeberPotential, c::Crystal)
@ REPL[35]:1
[2] totalEnergy(::LenardJonesPotential, c::Crystal)
@ REPL[34]:1
I see. So here you’re really defining three functions in one by using the abstract type. But you then have to build the function to work for any of the given potentials right?