[ANN] PatModules.jl: a better module system for Julia

I’m not the OP, but one of inconveniences with the current include system is that there is literally no way to tell what are the dependencies of a specific source file. So one needs to dig through other files (github search help here, of course) in order to copy a part of code to another project.
An obvious alternative could be to require each file to be a self-contained “module”, even without explicit top-level module declaration.

I’d suggest all participants take a break on this thread until the New Year.

17 Likes

Welcome, @patrick-kidger! Rather than take the opinion of StackOverflow user395760 as a given, it would be good to understand why the current design is problematic in your view. What concrete issues have you observed it causing?

10 Likes

I like that include is simply a source-code management tool (e.g. breaking a big file into a bunch of smaller related parts) that doesn’t impact system architecture. This is a feature, not a deficiency.

7 Likes

This doesn’t sound like the right way of working. You should use import AwesomeModule: functiontoborrow syntax instead.

3 Likes

Okay, quite a lot to unpack here. Some quotes-with-answers deliberately out of chronological order for better presentation.

I’ll start off by apologising if I’ve come across the wrong way. I certainly don’t mean to offend anyone. Clearly I have a controversial opinion – I am trying to express disagreement without derogation.

I’ll restate that for emphasis: I absolutely don’t mean to cause offense.

Thanks for the welcome! Okay, let’s get into the meat of this.

I’m constructing a module/package/some large blob of code.
I have two files A.jl and B.jl, which depend upon some common functionality. The typical pattern is to factor this out into some other file, in my case often with an unimaginative name like utils.jl.

In order for A.jl and B.jl to see the definitions of utils.jl, they must both include("utils.jl"). This poses a problem: they cannot both perform this inclusion. Eventually both A.jl and B.jl will themselves get included somewhere, and then utils.jl has been included twice. The problem with this approach is the problem of duplication of definitions.

For example if this occurs within some module hierarchy, then we can end up with two distinct copies of the contents of utils.jl, contained within different modules. This isn’t a huge issue if utils.jl only defines pure functions, but if utils.jl defines some types, with functions dispatching based upon these types, then the copies are mutually unintellegible: you cannot dispatch to functions defined in one copy using the type defined in the other.

The solution is apparently to include both A.jl and B.jl in some other file, say entry_point.jl, and require that entry_point.jl will include("utils.jl") on A.jl and B.jl's behalf. Indeed this is the standard pattern within several major projects, and I imagine the pattern that most people here are familiar with.

Unfortunately, this has its own problem: A.jl and B.jl are no longer self-contained. If A.jl wishes to use some function foobar() defined in utils.jl, then it simply uses it without qualification, trusting that it will be made available for it. This is the problem of not being self contained, which means that the dependency structure between files is not made explicit.
This implies several problems:

  • The code becomes harder to read, and to reason about: each file is implicitly assumed to be executed in some unspecified context.
  • It is harder to locate the functionality you are depending upon; as others have noted above this typically requires something like IDE support to track down.
  • Additional manual labour is required to ensure that entry_point.jl runs its includes in the correct order.
  • It becomes harder to locate old/dead code that isn’t depended upon by anything.

And moreover these issues are generally exacerbated once multiple developers are involved.

I don’t think these issues are controversial – from earlier in this thread:
@oxinabox: “… It’s a fair complaint.”
@aplavin: “one of inconveniences with the current include system is that there is literally no way to tell what are the dependencies of a specific source file”
(If either of you feel I’m misrepresenting your point of view here then do please let me know and I’ll take it out.)

So whilst the limitations of this approach are to some degree manageable, they are limitations, and ones with increasing bite as project size grows. It is not overstating my position to say that I think this is the single biggest limitation to work around when using the Julia language; at least that I’m aware of.

As an explicit example, try having a look through the source code for PyTorch. The Python bits (which follow the first pattern) are generally easy to follow. The C++ bits (which follow something akin to the second pattern) are generally difficult to follow.

Do note that ultimately this all an issue about handling files – not modules, nor packages. (Despite the title of this thread – the focus on modules has been because they can be used as a potential solution.)

So what is the solution? (Beyond just putting up with it.) As far as I can tell, until now there hasn’t been one. PatModules.jl is one (deliberately simple) approach, but not one that I’m particularly wedded to. I think if a solution to this problem made its way into the language as a whole I’d probably advocate for a different more sophisticated option. But I shan’t get into that now – let’s focus on establishing whether there is an issue or not first.

Does that all make sense? What are your thoughts?

Correct – because both are installed as packages. (In this scenario Julia keeps a global reference of all imported packages and re-uses them if possible.) This discussion / my point is focused solely on the construction of a single package (or more generally some complicated blob of code), and ways to split code across multiple files when doing so.

Quite possibly I am wrong. I haven’t been convinced otherwise yet, but I promise you I am reading every reply, and trying not to be a zealot about anything.

I spent a fair bit of time searching around looking at existing solutions to this problem, and existing thoughts on how things may be improved:

Current recommended best practice 1
Current recommended best practice 2
Current way of performing relative imports
An example of what is done in existing major packages
A comparison to C++ (a language with the same basic issue)
#4600: a potential change, but not really a fix

With the general overview being that (a) the problem exists, (b) it has already been acknowledged, but © there are at present no good solutions.


Phew, that was a long post. Thank you to those that read it in its entirety.

PS: And since I didn’t comment on it earlier:

Thank you! It’s very flattering to be recognised “in the wild”.

24 Likes

I appreciate your thoughts. However, the whole thing is a bit abstract.

A and B depend on some functionality in utils. If that is something of interest in both, it could become a module that both should “use” (I often wish the keyword was use, not using ;-)). Including is not a very nice approach for allowing access to common functionality.

I think a concrete “for example” would be helpful.

I for one think that your foray into Julia esoterics is very interesting. Keep going! And, as some others already said, welcome!

6 Likes

I tried to talk about an example in the repo but I failed to even find the code…

https://github.com/pytorch/pytorch/blob/master/torch/linalg/init.py

What you go to the linalg module and there’s no source code there?

And the reason why the code is so hard to read is precisely because it uses this nonlinear go-to architecture. IMO, everything should have a clear top level so the code is linear and can be read like a book, while nonlinear reading should be helped by tools (in any language). The problem with PyTorch is it doesn’t read like a book: there is no table of contents telling you what comes after another. There is no flow. You have to already understand the code in order to understand it since new code can come in from anywhere. You pick a file and try reading it, and… follow hyperlinks until you think you understand things? Well if you go to torch.linalg you don’t even find the linear algebra so good luck! (hint: it’s all global as we will see later, violating the these module rules that were argued for in the first place).

The style in OrdinaryDiffEq is linear. Here is your table of contents:

(Note, this holds for every Julia package!) It tells you exactly what comes in, the chapters in what order, and it also has the exports to tell you what will come next (it should use more import instead of using, but that’s a separate matter). You can read this start to finish in it’s intended order and nothing you don’t know will jump out at you. And actually, by design this has to be legible or else you get an error! So no go-to style of code design, instead you have one canonical way to understand the code. You could use other tools as an appendix to jump around, sure, but if you want to understand the logic you can always go back to the story.

There will always be people who prefer coding with go-tos, with a bunch of globals, and a bunch of dynamic scopes, but I think time has told us again and again that making things simple and making things constrained always is helpful sooner or later. And programming styles which make people want to just append everything to globals and import as * are (a) hard for people and (b) hard for tools.

So in in a simplified sense, I think this whole discussion is phrased incorrectly. It should be understood as, “here’s a way to making using go-to’s easier so that way as code flies in from left and right you can try and make sense of the random assortment of globals!”. But the real question to ask is, “have you tried making your code read linearly and reading code linearly?”. Because you skipped the chapters on the caches and then complained that we Game of Thrones’d the ending, and then from the Cliff Notes wrote a final essay saying that the characters were undeveloped. :man_shrugging:

15 Likes

If A.jl and B.jl are in the same module, then the main source file will typically have

include("utils.jl")
include("A.jl")
include("B.jl")

If A and B are different modules, then either the utilities could live in their own module, and both A and B would be using ThoseUtils, or alternatively either module could contain those utils the other would be using it.

I still think you are laboring under a misunderstanding: this is not how include is used in Julia. You include a file within a single module once. I would really recommend reading

8 Likes

Could you please elaborate a bit how import solves the problem?
To give a simple example of what I meant by source files being non-self-contained:
Suppose I’m reading source of CSV.jl and come across this line (CSV.jl/detection.jl at 04a2cc7caa7eff226d8dadeae7d63d8d9867a19b · JuliaData/CSV.jl · GitHub):
makeunique([normalizenames ? normalizename(x) : Symbol(x) for x in names]).
Oh, I need something like this in my project, completely independent of CSV! OK, but where do these functions makeunique and normalizename come from? There are no imports in this file at all - so I don’t even know if these functions are in CSV.jl or some other package it depends on. One needs to use github search and hope that it points to the definition of these functions, if they are in CSV.jl at all. If they are in another package it’s even more complicated.
Contrast this to each file being a separate “module-like” entity, with imports at the top - something like:

using Parsers
using PooledArrays
using .utils

# code

Easy to guess that those two functions do not live in Parser or PooledArrays, even if imported names are not listed explicitly. Then no search is required: they are contained in utils.jl.

2 Likes

Captain Tooling to the rescue!

julia> using CSV

julia> parentmodule(CSV.makeunique)
CSV

julia> methods(CSV.makeunique)
# 1 method for generic function "makeunique":
[1] makeunique(names) in CSV at /home/tamas/.julia/packages/CSV/la2cd/src/utils.jl:282

julia> parentmodule(CSV.normalizename)
CSV

julia> methods(CSV.normalizename)
# 2 methods for generic function "normalizename":
[1] normalizename(name::Symbol) in CSV at /home/tamas/.julia/packages/CSV/la2cd/src/utils.jl:274
[2] normalizename(name::String) in CSV at /home/tamas/.julia/packages/CSV/la2cd/src/utils.jl:275

Or, alternatively, learn about the relevant tools.

6 Likes

Of course, many inconveniences can be reduced with tooling. And github search that I mentioned is a tool as well.

But the process is still more complicated than it could be. That is, one needs to run julia + add CSV#master + import CSV to use julia introspection functions - compared to simply looking up imports at the top of the source file. Moreover, if the package in question has heavy dependencies (e.g. RCall.jl), they will be installed as well taking lots of time and bandwidth, even if not really required for the specific function. If the package repo is non-public or requires some kind of proxy to access then even more setup is needed. If package is not compatible with my julia version… Etc.

Note I’m not saying anywhere that it’s impossible to find where a specific function is defined. It’s just significantly less convenient than could be.

I would note that neither parentmodule nor methods are documented in that link, yet they are the functions you use to make your point. I was unaware of either, thanks. Your amazing efforts on the public list to inform people are really welcome; I wish the documentation included this wisdom. Julia’s documentation could additional attention, based upon common threads in forum discussion. Conversely, the best forum responses are ones that point to clear/concise documentation.

7 Likes
module MyPackage
    include("utils.jl")
    include("A.jl")
    include("B.jl")
end

Then in your tests you import MyPackage and use the public (exported) or private interfaces as needed. I’ve not had a problem with this common pattern. Sure, it doesn’t encode that A and B are otherwise independent. However, does that really matter from a user’s perspective?

If someone is going to maintain your code, they’ll have to learn the dependency graph anyway; so this particular concern is probably the least of the worries a collaborator may have. For me, the primary challenge I have with maintaining code is the margin. I prefer code to wrap at 76 characters… or less. With 50 year old eyes and having not listened to my mum warning me about staring at the sun when I was 5yo, I need big fonts on a big display… or, better, to send the code to the printer for hard-copy. I notice that those who argue for 132 columns are often quite young with eagle eyes.

3 Likes

I was pointing out additional related tools.

This is your lucky day then: methods is mentioned in the Methods chapter (among other places), while the Modules chapter includes an example of parentmodule.

I certainly agree that the documentation could use additional attention, especially in the sense that more people could just read it.

4 Likes

I really appreciate your engagement in the topic, and I again want to commend your initiative. When things get prickly, it can be easy to disengage.

I don’t have much of technical merit to contribute, I just wanted to highlight that I don’t think your opinion itself is what’s controversial, lots of different coding styles and approaches can coexist happily. That’s one of the great things about this community - very few people are dogmatic about anything!

I think what came across poorly is the implication that we’ve all been doing it wrong/poorly. It’s totally fine to want a different structure - indeed one of the amazing things about the language is that you can make something like PatModules.jl to help the language conform to your preferences (and easily share it with others that feel the same way).

16 Likes

I’m curious about the following if this is the idiomatic case for Julia: how would you set up tests for a subset of what’s in the module, e.g. only for code in A? Importing the full package for tests implies that that module is fully functional (or at least syntactically correct). In the Python equivalent form you can import A as a self-contained module (as it always contains its dependencies) and so can write tests in a more fine-grained manner, independent of the state of the full package.

I do not think it is useful to try to test a part of a package in a way that is independent of the rest of the package working. If some subset of a package is independent of all the rest, then:

  1. It could be its own package.
  2. It can be an inner module, in a separate file, and you can include and import it in a test set just for it.

However, even if it is an inner module this does not mean it does not import anything from the parent module or sibling modules (an independent piece of code may be in its own file and module, but being in its own file and module does not mean it is an independent piece of code).

3 Likes

I use this same pattern (i.e., the utility/common/shared code is in files which are included in the “outer module”/“entry file”), but I do some extra encapsulation. Many of the included files have a module wrapping all of its code. Consequently, the dependence graph is coded at the top of each file by the import of specific functions and types from sibling modules.

Personally, I found Julia’s reliance on include unusual at first for a very modern language - compared to Scala, for example, where the compiler figures dependencies out by itself. But now I’ve come to feel that the “simple” include mechanism actually encourages developers to think more carefully about the structure and inner inter-dependencies of their package(s). I certainly haven’t felt limited by it in any way.

One thing that came to my mind - if Julia should, at some point, become able to pre-compile code in parallel - might we then need a more automatic, graph-based mechanism? I’m not really expert enough regarding the Julia compiler to offer an opinion here, though. However, I would expect that the Julia compiler team has considered this already (and may already have a concept in the drawer for that eventuality?).

5 Likes