Single module vs. submodules in a project

erans · February 15, 2021, 11:23pm

I have been reading a fair amount (a few references at end of post) about how to “correctly”, or better yet idiomatically, structure code in Julia. In general, there seems to be a strong sentiment that many submodules probably means the code could be better split into several packages.

My question is: wouldn’t breaking code into submodules help in terms of precompilation? (assuming that most of the submodules are not constantly updated).

I should note that I don’t fully understand all the intricacies of JIT compilation. It’s therefore quite possible that my assumptions below are incorrect and in that case cool! no submodules for me.
My sense is that the reason to use Julia is to leverage these sort of subtleties, which is where the performance gains are achieved, so I want to make sure I’m doing it right

Example

Say I have a package MyPkg:

Module MyPkg
include("A.jl")
include("B.jl")
end

#contents of A.jl
foo_a(x) = x + 2
[...]
#contents of B.jl
foo_b(x) = 2*fooA(x)
[...]

Now let’t assume that:

fooA (and the contents of A.jl is unlikely to change very frequently, while
I’m still tweaking fooB and might even want to add a few more things.
Finally, the contents of A.jl have no real meaning on their own (Silly example above aside, maybe they set up some specialized Types for the calculations coming in B.jl, are utility functions that are tailored to the functions in B.jl, etc.)

My current understanding is that in the construction above any change in B.jl would require precompilation of MyPkg (upon restarting the REPL), which would also necessarily include the contents of A.jl.

I could restructured the code as follows:

Module MyPkg
include("A.jl")
include("B.jl")
end

#contents of A.jl
Module A
export fooA
fooA(x) = x + 2
[...]
end
#contents of B.jl
Module B
using .A
fooB(x) = 2*fooA(x)
[...]
end

If I now make a change to B.jl wouldn’t only MyPkg and B need to be recompiled, while the already precompiled A can be used as is?

Some references

Beyond the Julia documentation (specifically Modules and Code Loading) and the Pkg documentation, there are also several interesting discussions on this forum about the question (many quite recent):

Tamas_Papp · February 16, 2021, 3:05pm

You may not need to worry about this unless your codebase is relatively large. Up to 5–10 kLOC (and that’s a lot of code in Julia), a single module/package often works fine.

I am not sure if there is anything that should have this effect. If compilation time is a concern, just investigate and fix that directly. These excellent blog posts are a good starting point:

jebej · February 16, 2021, 4:06pm

I’m not 100% sure, but I don’t think so. As far as I understand, the precompilation unit is the “package”, so if anything changes within the package, even if in a submodule, the whole package is recompiled.

MA_Laforge · February 16, 2021, 5:23pm

Submodules → Several Packages

I think you are correct in that the message being sent out in the community does sound the way you summarized it. However, I believe it stems from our inability to concisely express ourselves.

Software designers use modules as a means to encapsulate a software component, and hide implementation details. Julia doesn’t really do information hiding, so you need more explicit means of communicating your public interface to the user, as well as how to use your software component effectively.
With most programming languages, module encapsulation is typically done (in part) by wrapping code from each “module” in its own namespace. This ensures functions/variables/constants… will not collide with those of the outside world.
In Julia, you instead leverage multiple dispatch, by judiciously writing function signatures. Thus, with multiple dispatch, you worry more about method collisions instead of function name collisions.
- Before namespaces and multi-dispatch, you would have to write something like stackpush_i64(list,val), and stackpush_f64(list,val), etc. to avoid name collisions.
- After namespaces, you could write datastructures::i64stack::push(list, val) to access the push function of Int64 stack object found in the datastructures library. This way, all stacks could share the same name for their interface without worrying about name collisions.
- Similarly, in object-oriented languages, you can create a stack “object” with mystack=datastructures::i64stack::stack(), then add to it using mystack.push(val).
- In Julia, generic (parameterized) stack objects basically come for free, and the programmer has very intuitive facilities to communicate type information to the compiler (don’t need the intricate tree of namespaces found in other paradigms).
As you might have noticed, the advantage of having a flatter namespace hierarchy is that your users of your package don’t have to fully qualify the.path.to.your.sub.module (which can often make things more readable).
In Julia, you still have to worry about name collisions for variables and constants, though:
- But since developers have full control over the entire package they are building/modifying, they can typically manage name collisions effectively.
- At times, it might still be very practical to create “sub-modules” (different namespaces) to deal with these collisions and/or collect software components under a meaningful sub-module name. So I think this is where the community’s message about sub-modules can get misinterpreted.

I’m still working out how to effectively communicate more optimal ways to leverage Julia’s facilities (Not to mention that I’m still figuring it out). That being said, here are a few tips on how I structure my Julia code:

I no longer break out different software components to separate “modules” (because Julia modules are really just namespaces).
I still try to write out different software components to separate files/folders so I/other developers can easily locate them.
I only create Julia packages when I want to “bundle” up a collection of software components that are generic enough to be reusable by more than one project/application.
Again: If a software component is not to generic enough to be used by other projects/applications, then it simply exists as separate files/folders, which are directly include()-ed in the project/application requiring it. The idea is to help other developers/myself understand the logical structure of the software, and find the relevant code.

MA_Laforge · February 16, 2021, 5:49pm

As @jebej mentionned, I’m pretty certain the precompilation unit is the “package”, not the “module”.

That being said, from a practical standpoint, I do like your idea of splitting out code that seldomly changes, but takes alot of time to compile (assuming that this really is the case).

The only adjustment I would make is to write it to a separate package (not a submodule).

Note that you might eventually want to merge the two packages back together to avoid registering two modules (and avoid dealing with two separate .git repositories to house your code). I say this because Julia packages are registered a .git repository level (sort of; there are workarounds here, but I think they need a bit of streamlining).

Pitfalls to precompilation

To take full advantage of precompilation though, you might need to do a bit of research:

I think some portions of precompilation only gets triggered when a user of your package tries to call a specific method in your package (ie calls a function, but for a specific set of types).
The good news is that I’m pretty certain there are ways to trigger the compilation of these specific methods using PackageCompiler.jl so that it gets included in the precompilation-image of your base package.

Visibility issue with this thread

You might also want to:

Check out discourse archives for more info on precompilation.
Possibly change the title of this thread.

Because it sounds to me you are really looking for help with precompilation, but it’s not obvious from the title.

MA_Laforge · February 16, 2021, 5:58pm

Oh, and I realize that my suggestion of splitting out code that won’t be used by any other project/package/application goes directly against the post I wrote immediately above it.

In this case however, you have a practical reason to do this (reduce development time), despite it being unnecessary from a structural standpoint for code organization.

paulmelis · February 16, 2021, 6:26pm

Looking at the contents of ~/.julia/compiled/v1.5 the results are apparently stored per module. And a package as far as I can see is nothing more than a module structured in a specific way. I would even venture (but don’t know) the compiler doesn’t distinguish between the two and only Pkg.jl is aware of packages.

~~Anyway, did some testing:~~

Edit: redid the checks for caching of the compiled files
Edit 2: per comment Single module vs. submodules in a project - #8 by jebej below none of the above makes any sense. A real submodule uses a relative form of using and then will not show up as a separate cached file.
Edit 3: deleted the example code which was showing the wrong thing, as Discourse doesn’t support strikethrough of code it seems (even though the edit preview did show it)

jebej · February 16, 2021, 7:29pm

You are not using submodules here, you are just creating two individual “packages”, though because you are pushing to LOAD_PATH these things are less clear.

A submodule must be defined within the scope of a parent module, like in the example here copied below, or by including the file containing the submodule within the parent module. The module blocks must be nested.

module Parent

module Utils
...
end

using .Utils

...
end

If you want to do that precompilation test properly with submodules, you would need the following:

module MyModule # MyModule.jl 

include("MySubModule.jl")
using .MySubModule # note the dot

export f

f(a) = 2*g(a)

end

As a side note, pushing to the LOAD_PATH is not the best way of organizing code, since you won’t be making use of dependency resolution and reproducible environments you can do with a Project.toml file and using Pkg.

paulmelis · February 16, 2021, 8:11pm

Doh! And it even works the same in Python, so silly of me not to realize that. I’ll ammend the comment above.

erans · February 16, 2021, 8:15pm

Thanks everyone for the input. This has cleared several things up for me.

From looking at the results in the ~/.julia/compiled/v1.5/ folder I can see that indeed submodules are not precompiled separately. @paulmelis thank you very much for taking the time and effort to run that test. I believe @jebej is correct though, that the issue is that you are using LOAD_PATH (btw, avoiding load path is the primary reason why I’ve fallen into this rabbit hole)

These are great! and gives me a lot to think about w.r.t precompilation

Thanks for all the tips, regarding this specific point though, my main interest currently is the best way to structure code. My thought was, that the “best” way would likely best leverage Julia’s capabilities. As far as I can tell, the two main Julia highlights are JIT and multiple dispatch. With that in mind, I was/am trying to figure out how my code structure can best play nice with these concepts.

Thanks again to everyone for the good feedback!

paulmelis · February 16, 2021, 8:16pm

Yep, see my updated comment above

erans · February 16, 2021, 8:18pm

woops, I must have been a minute too quick with my post.

jebej · February 16, 2021, 8:26pm

As evidenced by the multiple posts on this issue, code organization is quite a tricky thing. I think many people were using the pre 1.0 way to simply have code on the LOAD_PATH, and never transitioned to the new method (I know I didn’t get it for a while!).

It is a little more work and complexity, since now individual packages must have a Project.toml, and must be Pkg.deved to the global environment if you want to simply type use MyPackage after starting julia.

erans · February 16, 2021, 8:49pm

Indeed.

I’m actually leaning a bit more towards simply ] activate <path to my project> which should still make using MyPkg possible, no?

This seems to be a bit simpler (at least for more of an “application” use-case) than the dev option. But I’m happy to be explained otherwise

jebej · February 16, 2021, 8:57pm

Yes, if you only need to use that particular package within its own environment, then simply activating that environment works!

I like to have a bunch of the packages I work on be available at the command line when I need to use them for quick commands, without having to manually activate an environment.

MA_Laforge · February 16, 2021, 11:54pm

If you go that route, you might prefer creating a simple shell script to launch julia using the --project /path/to/my/project argument.

You can also check out my module “ConventionalApp.jl”:
→ [ANN] ConventionalApp.jl

You don’t HAVE to use ConventionalApp.jl. It is basically just a utility module to help you create bash files, etc. But it gives you a solution on how conventional applications can be “generated”/launched in a somewhat practical manner.

Note that ConventionalApp.jl also cleans up the LOAD_PATH (removes unnecessary "@v#.#" “project” from the environment stack).

MA_Laforge · February 16, 2021, 11:59pm

If that’s the case, I suggest you take a look at my post here (and those it links to):

lmiq · February 17, 2021, 12:14am

And if one is at the project directory, it suffices do start Julia with

julia --project

Since it is somewhat natural to navigate to the working directory to code, that is quite practical.

Topic		Replies	Views
Proper way of organizing code into subpackages New to Julia packages , code-organization	48	7883	August 24, 2022
Best way to structure Julia code General Usage question	17	4586	July 31, 2019
Advice on structuring larger codebases General Usage code-organization	19	1851	March 19, 2020
Code organization to use precompilation with a tiny example New to Julia code-organization	0	421	August 5, 2021
Multiple Julia processes while using Modules General Usage modules , code-organization	6	776	October 26, 2024