How to modularize/package a large project | What kind of package/project directory structure works well?

I’m having some issues trying to find a directory structure for a large Julia project which works well.

I have two requirements objectives.

  • A structure which is convenient for deployment
  • A structure which is convenient for development, specifically with VS Code

Allow me to explain in more detail.

A structure which is convenient for deployment should be simple and require minimal configuration to bootstrap an executable program

For example, it would be acceptable to set the LOAD_PATH with export JULIA_LOAD_PATH once, to add at most one path. In other words, if there are O(N) Julia Packages/Modules which an executable depends on, there should not be O(N) dependency paths to configure.

A structure which is convenient for development should work well with IDEs and editors

In this case, I choose VS Code, since this is the most commonly used general purpose development environment, which appears to have reasonably good support for Julia. Additionally, most of my team are using it, so I have to support it as a Tier 1 product.

Summary of my findings so far

Thus far, I have not been able to satisfy both requirements. I can either build a project structure which deploys easily, or I can build one which integrates well with VS Code. I do not seem to be able to do both simultaneously.

Therefore I am asking for advice.

I have tried multiple variants of project structure. However, there are three main structures which I have identified. Other structures which I have experimented with are slight modifications of these.

Single Julia Package

This is the simplest method, but I don’t think it scales well.

  • Create a single Julia Package. That means there is a single Project.toml. There can only be one Package name. This single package might contain multiple “executables” (files with a function main), as well as being a single uber-library.

To run an “executable” I guess one would have to load julia with

$ julia --startup-file=no -e 'using UberPackage; UberPackage.MyModule.main(["ARG1", "ARG2"])`

Given my experience with Python, I have no objections to this way of launching an “executable” using the concept of an “executable module”.

However, it is not clear to me how to subdivide this uber package.

Clearly, a single src directory with hundreds of source files is unworkable. Can related files be grouped into a common subdirectory?

UberPackage/
  Project.toml
  src/
    UberPackage.jl
    common/
      common.jl
      aux.jl
    math/
      math.jl
      optimize.jl

What would you do with the files under common and math? Just include each file from UberPackage.jl?

# UberPackage.jl
include("common/common.jl")
include("common/aux.jl")
include("math/math.jl")
include("math/optimize.jl")

It will (probably) work, but it’s not ideal.

The main problem I forsee with this is not being able to benefit from modularization.

If UberPackage contains everything, but some executables don’t need the entire contents of UberPackage, we don’t have modularization. Load times become very long, because even a small utility has to be run by calling using UberPackage to load all of the code, even if the “small utility” just uses a small fraction of it.

Precompilation times will be very long. This is a problem we see at the moment. People are asking questions like “given that Julia takes so long to pre-compile and load, why didn’t we just use a compiled language to begin with?”.

It’s a valid point.

  • Deployment, and setting LOAD_PATH is trivial. Set it to the location where UberPackage can be found.
  • Development is also trivial. VS Code (I think) can detect all of these include paths correctly, so all dependencies between modules (I think) will be resolved.

Multiple Package in a single flat directory structure

Unlike the above, this structure does offer the benefits of modularity.

Rather than having a single UberPackage, each set of related concepts can be grouped into a package.

Here’s an example.

Common/
  Project.toml
  ... etc
Math/
  Project.toml
  ...
... other packages etc
  • Deployment is trivial. LOAD_PATH only needs to be set to a single path. All packages can be found in that path.
  • Development is broken. VS Code doesn’t have a way to set the Julia LOAD_PATH, so when working in the Math module, it will not know what Common is.

There is a way around this. Use pkg> and the develop command to manually set the dependencies between each of the modules.

But this creates total hell when trying to refactor anything. This kind of explicit, manual dependency management is something package managers are supposed to avoid. What we end up with is a very fragile, brittle project stucture, which very likely can be easily broken, and likely will be very hard to fix if it does break.

Tree of Packages

I’m not sure if mentioning this idea is a good idea to be honest. I really don’t like this idea at all, but I have seen it mentioned before so I thought it worth raising.

RootPackage/
  Project.toml
  Common/
    Project.toml
    ..
  Math/
    Project.toml
    ...

The reaon why I dislike this should be obvious.

  • Lost modularity. Can’t pick and choose a subset of packages to use in an application
  • Packages should be isolated things, which depend on other packages. Having one package which “contains” another package doesn’t make a lot of sense to me. Having a tree of modules is reasonable and sensible. A tree of packages is not
  • This isn’t how packages are orgninized when installed in the JULIA_DEPOT_PATH
  • Messy directory structure

In regards to development and deployment.

  • Development seems to work, VS Code finds everything correctly
  • Deployment is broken. This might be because I haven’t set some of the dependencies correctly. I’m not sure. I couldn’t find a way to get it to work, and gave up relatively quickly becuase I don’t think this is a sensible solution.

Summary

Does anyone have any advice? Is there an “official” or “approved” solution to this problem?

Basically we want 3 things.

  1. To break down an existing large project into smaller components, which will also allow us to add unit testing to those smaller components
  2. For whatever solution is used to work with VS Code, meaning that VS Code can recognize the dependencies
  3. Modularity, meaning that a small utility should not have to load all of the code in the git repo to be able to run. (Aka: avoid long precompile times when precompiling most of those files is pointless. We cannot turn precompilation off entirely. This will cause a latency problem and break out deployment in other, more subtle ways.)

Thanks in advance