Large programs: structuring modules & include such that to increase performance and readability

I am writing a large hydrological model for which I require to divide the program in many modules incorporated in individual files. I am questioning what will be an efficient and readable approach to share the modules across the numerous files?

I found that for e.g. Include(“Filename.jl”) in every module is clean but not the fastest especially when I need to perform loops (which will call include many times.)

I found that an efficient way is to include(FileName) in the MAIN() program and then to export modules as variables.

eg. EVAPOTRANSPIRATION(, evaporation, transpiration, ...)

It works but it is not elegant, so I am questioning if there is a more elegant way of sharing modules?

Below is an example which works:

FILE: Main.jl

 module main
      include(“Evaporation.jl”)
      include(“Transpiration.jl”)
     include("EvapoTranspiration.jl")
 
      function MAIN(Et)
           for i =1:10
                return EvapoTranspiration = evapoTranspiration.EVAPOTRANSPIRATION(Et * i, evaporation, transpiration)
           end
      end
end

FILE: EvapoTranspiration.jl

module evapoTranspiration
     export EVAPOTRANSPIRATION

     function EVAPOTRANSPIRATION(Et, evaporation, transpiration)
          EvapoTranspiration = evaporation.EVAPORATION(Et)+transpiration.TRANSPIRATION(i * Et)
     end 
end

File: Evaporation.jl

module evaporation
     export EVAPORATION

     function EVAPORATION(Et)
          return Evaporation = Et * 0.3
    end
end

File: Transpiration.jl

module transpiration
     export TRANSPIRATION

     function TRANSPIRATION(Et)
          return Transpiration = ET * 0.5
     end
end
1 Like

Generally, there is no reason to call include anywhere but the toplevel. You definitely should not include files in a loop over and over. There is no point in it either, definitions in those files should be available once they are included.

It is not clear from your code whether you want/need submodules, but that’s what you effectively get above.

Whether you need submodules or multiple packages is up to you, I would say that around 5–10k LOC is the threshold for most people (and that’s a lot of code in Julia).

I would suggest partitioning the code into a package which contains reusable functionality, and scripts, which are using this package and run things.

4 Likes

Dear Tamas I thank you for responding to my question. I agree that one should call include in the toplevel which in this example is in module main.

To my understanding the question you asked is to determine if we put all the modules in one files and use the submodules architecture. Since we are dealing with large models with different components of the water cycle it will be not readable to write one big file such that in this e.g. Main.jl will include all the modulesevaporation, transpiration. into one big file. To my understanding it will be best to partition the different processes into different files

You suggested to create packages. I agree that different tools should be partitioned into packages, but to my understanding I do not think that the scientific modules should be put into packages which will decreases the visibility and the easiness of correction of the code.

The question remains, what is a clean way of making the functions of the modules declared at the top level available to all modules in the program written in different files?

1 Like

I think you are asking two relatively different (and largely orthogonal) questions:

  1. “physical” organization: should the source code be split into several files ?
  2. “logical” structure: should the code be structured into (sub-)modules ?

I would say that the answer to question 1. is most certainly “yes”. As soon as your code starts growing, you’re probably better off splitting it into several source files. This is what include() is for.

This is largely orthogonal to modules. You can for example start with a unique source file defining the top-level module of your project. When this file grows too big for your taste, you split it into several files and include these in the main file. The included files do not have to define new sub-modules; they may as well contain the exact same content that originally was copied from the main source file.

As for question 2, you might also want to “logically” structure the code into (sub-)modules. Sub-modules help you group related features together, so that some client code can issue a using SubModule and get access to everything that is exported by the module. Again, this “logical” structure is largely orthogonal to the “physical” organization of your sources into files : it would for example be perfectly legal to have a unique source file defining the top-level module and all sub-modules. Nevertheless, it is customary (and good practice) to define each sub-module in a source file of the same name (which can then include other source files if needed.)

Now a third question would be to ask whether sub-modules defined in your project should rather be full-fledged packages. This would be the case if such sub-modules provide features that could be useful in several contexts. In this case, the source code for the sub-module should be put into a different package. And this package shoud be Pkg.added as a dependency in your project.


Is this clear? If you can briefly describe the various components in your project, we might be able to help you determine an adequate way of organizing and structuring you source code.

1 Like

To answer this specific question: if you want to define submodules (and, again, it remains unclear whether submodules are useful/needed in your case), then each part of the code wanting to use functions defined in another submodule should issue a using SubModule statement.

For example:

Main.jl

module Main
    # equivalent to copy-pasting the code contained in Evaporation.jl
    # since the code in question defines a module called Evaporation,
    # this will become a submodule Main.Evaporation
    include("Evaporation.jl")

    # Same as above
    include("EvapoTranspiration.jl")

    # This brings evapo_transpiration into scope
    using .EvapoTranspiration
    
    main() = println(evapo_transpiration(42.))
end

Evaporation.jl

module Evaporation
    # If some client code issues a `using Evaporation` statement,
    # the function `evaporation` will be brought into scope
    export evaporation

    # Actually define the function
    evaporation(x) = x
end

EvapoTranspiration.jl

module EvapoTranspiration
    export evapo_transpiration

    # Use a relative path to refer to the Evaporation module:
    # two leading dots mean that the module is defined as a submodule
    # of the current parent
    using ..Evaporation

    # the `evaporation` can be used directly, since the Evaporation module
    # has been brought into scope
    evapo_transpiration(x) = 2 * evaporation(x)
end
4 Likes

@ffevotte already answered your question about code organization: just split into files, and include them. For example, this is a typical layout I usually use.

Scientific code should also be put into packages (except for the runtime code). Julia packages are so lightweight that it doesn’t take much, and it will make your life much easier, eg you can use

1 Like

Thanks ffevotte, you answered my question. Thanks to your help I have cleaned up my code and it works beautifully.

Your answer that two leading dots mean that the module is defined as a submodule of the current parent using …Evaporation

The question why we use 2 … and not 1 . ?

Thanks I have included Revise.jl in the main module.

1 Like

Great to see more biophysical models in julia! We should start an organisation soon to get everyone talking more.

And I have some recomendations.

In julia structure doesn’t affect performance much if at all. It all compiles together the same way whether you have separate packages or one repository. You should consider the implications of that for collaborating and sharing code in these models, it’s much easier than with Fortran and C/C++ so leveraging the community is nearly always the best strategy - for math tools like DifferentialEquations.jl, optimisers and quadratic solvers that often seem to be written custom in Fortran models. It also means we can work towards sharing other components - like photosynthesis. It has really no cost using external packages.

If you must use one large repository I would suggest using internal modules and make them as modular as possible like they are doing in CLIMA. Then your program structure stays more manageable and you can always separate out modules to a separate package if external use-cases emerge.

But I increasingly make separate packages for everything as I end up using modelling components in multiple projects, which means other people can use them too.

Lastly the style guide has some general patterns and following them really helps readability among other julia users https://docs.julialang.org/en/v1/manual/style-guide/index.html

2 Likes

Everything is explained in this part of the documentation:

https://docs.julialang.org/en/v1/manual/modules/#Relative-and-absolute-module-paths-1

The simplest way I like to think of it is:

  • not dot means that the module is identified by an absolute path:

    • using MyModule looks from MyModule in the current environment
  • the first . switches from an absolute path to a relative one:

    • using .MyModule looks for MyModule as a submodule of the current module
  • every additional leading . goes up one level in the modules hierarchy:

    • using ..MyModule looks for MyModule as a submodule of the parent of the current module
    • using ...MyModule looks for a submodule of the grand-parent of the current module
    • using ..MyModule.MyOtherModule looks for a submodule named MyOtherModule, defined as a submodule of MyModule, itself defined as a submodule of the parent of the current module.

Does that make sense?

8 Likes

Thanks Ffevotte for your detailed and useful explanations, it makes perfectly sense .:grinning:

I would definitely recommend submodules instead of plain include if you plan to have different people working on them. Modules in julia serve to separate namespaces, and you don’t want name-clashes between helper functions or constants like _compute_stuff in evaporation and transpiration. Furthermore, this simplifies debugging.

Regarding exports and using, this is pure syntactic sugar once fully qualified names start hurting readability (Evaporation.evaporate(...) is much more readable if you have few call-sites, because the reader doesn’t need to look up what module this comes from, but the verbosity hurts readability if call-sites are all over the place; I suggest always starting with plain import, and refactoring to using once you have many call-sites).

Regarding packaging and physical organization, think about whether it will be common that a single git commit needs to touch several of your modules. It sounds like that will be the case; hence I would recommend sticking to a single git repo and a single package for the beginning. Regarding physical organization into several source files, the same applies: Too many tiny files hurt readability, too few huge files cause perpetual merge conflicts in git.

3 Likes

Thanks for your comments which is really helpfull

Thanks the style guide is helpfull and your recommendations are usefull

Please, can you add your dot explanation to Julia documentation? The last time I checked, it wasn’t as well explained.