Okay, quite a lot to unpack here. Some quotes-with-answers deliberately out of chronological order for better presentation.
I’ll start off by apologising if I’ve come across the wrong way. I certainly don’t mean to offend anyone. Clearly I have a controversial opinion – I am trying to express disagreement without derogation.
I’ll restate that for emphasis: I absolutely don’t mean to cause offense.
Thanks for the welcome! Okay, let’s get into the meat of this.
I’m constructing a module/package/some large blob of code.
I have two files A.jl
and B.jl
, which depend upon some common functionality. The typical pattern is to factor this out into some other file, in my case often with an unimaginative name like utils.jl
.
In order for A.jl
and B.jl
to see the definitions of utils.jl
, they must both include("utils.jl")
. This poses a problem: they cannot both perform this inclusion. Eventually both A.jl
and B.jl
will themselves get included somewhere, and then utils.jl
has been included twice. The problem with this approach is the problem of duplication of definitions.
For example if this occurs within some module hierarchy, then we can end up with two distinct copies of the contents of utils.jl
, contained within different modules. This isn’t a huge issue if utils.jl
only defines pure functions, but if utils.jl
defines some types, with functions dispatching based upon these types, then the copies are mutually unintellegible: you cannot dispatch to functions defined in one copy using the type defined in the other.
The solution is apparently to include both A.jl
and B.jl
in some other file, say entry_point.jl
, and require that entry_point.jl
will include("utils.jl")
on A.jl
and B.jl
’s behalf. Indeed this is the standard pattern within several major projects, and I imagine the pattern that most people here are familiar with.
Unfortunately, this has its own problem: A.jl
and B.jl
are no longer self-contained. If A.jl
wishes to use some function foobar()
defined in utils.jl
, then it simply uses it without qualification, trusting that it will be made available for it. This is the problem of not being self contained, which means that the dependency structure between files is not made explicit.
This implies several problems:
- The code becomes harder to read, and to reason about: each file is implicitly assumed to be executed in some unspecified context.
- It is harder to locate the functionality you are depending upon; as others have noted above this typically requires something like IDE support to track down.
- Additional manual labour is required to ensure that
entry_point.jl
runs itsinclude
s in the correct order. - It becomes harder to locate old/dead code that isn’t depended upon by anything.
And moreover these issues are generally exacerbated once multiple developers are involved.
I don’t think these issues are controversial – from earlier in this thread:
@oxinabox: “… It’s a fair complaint.”
@aplavin: “one of inconveniences with the current include system is that there is literally no way to tell what are the dependencies of a specific source file”
(If either of you feel I’m misrepresenting your point of view here then do please let me know and I’ll take it out.)
So whilst the limitations of this approach are to some degree manageable, they are limitations, and ones with increasing bite as project size grows. It is not overstating my position to say that I think this is the single biggest limitation to work around when using the Julia language; at least that I’m aware of.
As an explicit example, try having a look through the source code for PyTorch. The Python bits (which follow the first pattern) are generally easy to follow. The C++ bits (which follow something akin to the second pattern) are generally difficult to follow.
Do note that ultimately this all an issue about handling files – not modules, nor packages. (Despite the title of this thread – the focus on modules has been because they can be used as a potential solution.)
So what is the solution? (Beyond just putting up with it.) As far as I can tell, until now there hasn’t been one. PatModules.jl is one (deliberately simple) approach, but not one that I’m particularly wedded to. I think if a solution to this problem made its way into the language as a whole I’d probably advocate for a different more sophisticated option. But I shan’t get into that now – let’s focus on establishing whether there is an issue or not first.
Does that all make sense? What are your thoughts?
Correct – because both are installed as packages. (In this scenario Julia keeps a global reference of all imported packages and re-uses them if possible.) This discussion / my point is focused solely on the construction of a single package (or more generally some complicated blob of code), and ways to split code across multiple files when doing so.
Quite possibly I am wrong. I haven’t been convinced otherwise yet, but I promise you I am reading every reply, and trying not to be a zealot about anything.
I spent a fair bit of time searching around looking at existing solutions to this problem, and existing thoughts on how things may be improved:
Current recommended best practice 1
Current recommended best practice 2
Current way of performing relative imports
An example of what is done in existing major packages
A comparison to C++ (a language with the same basic issue)
#4600: a potential change, but not really a fix
With the general overview being that (a) the problem exists, (b) it has already been acknowledged, but (c) there are at present no good solutions.
Phew, that was a long post. Thank you to those that read it in its entirety.
PS: And since I didn’t comment on it earlier:
Thank you! It’s very flattering to be recognised “in the wild”.