[ANN] PatModules.jl: a better module system for Julia

Tamas_Papp · December 26, 2020, 9:56am

If A.jl and B.jl are in the same module, then the main source file will typically have

include("utils.jl")
include("A.jl")
include("B.jl")

If A and B are different modules, then either the utilities could live in their own module, and both A and B would be using ThoseUtils, or alternatively either module could contain those utils the other would be using it.

I still think you are laboring under a misunderstanding: this is not how include is used in Julia. You include a file within a single module once. I would really recommend reading

https://docs.julialang.org/en/v1.7-dev/manual/modules/

aplavin · December 26, 2020, 10:13am

Could you please elaborate a bit how import solves the problem?
To give a simple example of what I meant by source files being non-self-contained:
Suppose I’m reading source of CSV.jl and come across this line (https://github.com/JuliaData/CSV.jl/blob/04a2cc7caa7eff226d8dadeae7d63d8d9867a19b/src/detection.jl#L191):
makeunique([normalizenames ? normalizename(x) : Symbol(x) for x in names]).
Oh, I need something like this in my project, completely independent of CSV! OK, but where do these functions makeunique and normalizename come from? There are no imports in this file at all - so I don’t even know if these functions are in CSV.jl or some other package it depends on. One needs to use github search and hope that it points to the definition of these functions, if they are in CSV.jl at all. If they are in another package it’s even more complicated.
Contrast this to each file being a separate “module-like” entity, with imports at the top - something like:

using Parsers
using PooledArrays
using .utils

# code

Easy to guess that those two functions do not live in Parser or PooledArrays, even if imported names are not listed explicitly. Then no search is required: they are contained in utils.jl.

Tamas_Papp · December 26, 2020, 10:25am

Captain Tooling to the rescue!

julia> using CSV

julia> parentmodule(CSV.makeunique)
CSV

julia> methods(CSV.makeunique)
# 1 method for generic function "makeunique":
[1] makeunique(names) in CSV at /home/tamas/.julia/packages/CSV/la2cd/src/utils.jl:282

julia> parentmodule(CSV.normalizename)
CSV

julia> methods(CSV.normalizename)
# 2 methods for generic function "normalizename":
[1] normalizename(name::Symbol) in CSV at /home/tamas/.julia/packages/CSV/la2cd/src/utils.jl:274
[2] normalizename(name::String) in CSV at /home/tamas/.julia/packages/CSV/la2cd/src/utils.jl:275

Or, alternatively, learn about the relevant tools.

aplavin · December 26, 2020, 10:57am

Of course, many inconveniences can be reduced with tooling. And github search that I mentioned is a tool as well.

But the process is still more complicated than it could be. That is, one needs to run julia + add CSV#master + import CSV to use julia introspection functions - compared to simply looking up imports at the top of the source file. Moreover, if the package in question has heavy dependencies (e.g. RCall.jl), they will be installed as well taking lots of time and bandwidth, even if not really required for the specific function. If the package repo is non-public or requires some kind of proxy to access then even more setup is needed. If package is not compatible with my julia version… Etc.

Note I’m not saying anywhere that it’s impossible to find where a specific function is defined. It’s just significantly less convenient than could be.

cce · December 26, 2020, 1:09pm

I would note that neither parentmodule nor methods are documented in that link, yet they are the functions you use to make your point. I was unaware of either, thanks. Your amazing efforts on the public list to inform people are really welcome; I wish the documentation included this wisdom. Julia’s documentation could additional attention, based upon common threads in forum discussion. Conversely, the best forum responses are ones that point to clear/concise documentation.

cce · December 26, 2020, 1:22pm

module MyPackage
    include("utils.jl")
    include("A.jl")
    include("B.jl")
end

Then in your tests you import MyPackage and use the public (exported) or private interfaces as needed. I’ve not had a problem with this common pattern. Sure, it doesn’t encode that A and B are otherwise independent. However, does that really matter from a user’s perspective?

If someone is going to maintain your code, they’ll have to learn the dependency graph anyway; so this particular concern is probably the least of the worries a collaborator may have. For me, the primary challenge I have with maintaining code is the margin. I prefer code to wrap at 76 characters… or less. With 50 year old eyes and having not listened to my mum warning me about staring at the sun when I was 5yo, I need big fonts on a big display… or, better, to send the code to the printer for hard-copy. I notice that those who argue for 132 columns are often quite young with eagle eyes.

Tamas_Papp · December 26, 2020, 2:00pm

I was pointing out additional related tools.

This is your lucky day then: methods is mentioned in the Methods chapter (among other places), while the Modules chapter includes an example of parentmodule.

I certainly agree that the documentation could use additional attention, especially in the sense that more people could just read it.

kevbonham · December 26, 2020, 2:15pm

I really appreciate your engagement in the topic, and I again want to commend your initiative. When things get prickly, it can be easy to disengage.

I don’t have much of technical merit to contribute, I just wanted to highlight that I don’t think your opinion itself is what’s controversial, lots of different coding styles and approaches can coexist happily. That’s one of the great things about this community - very few people are dogmatic about anything!

I think what came across poorly is the implication that we’ve all been doing it wrong/poorly. It’s totally fine to want a different structure - indeed one of the amazing things about the language is that you can make something like PatModules.jl to help the language conform to your preferences (and easily share it with others that feel the same way).

paulmelis · December 26, 2020, 3:54pm

cce:

module MyPackage
    include("utils.jl")
    include("A.jl")
    include("B.jl")
end
Then in your tests you import MyPackage and use the public (exported) or private interfaces as needed. I’ve not had a problem with this common pattern. Sure, it doesn’t encode that A and B are otherwise independent. However, does that really matter from a user’s perspective?

I’m curious about the following if this is the idiomatic case for Julia: how would you set up tests for a subset of what’s in the module, e.g. only for code in A? Importing the full package for tests implies that that module is fully functional (or at least syntactically correct). In the Python equivalent form you can import A as a self-contained module (as it always contains its dependencies) and so can write tests in a more fine-grained manner, independent of the state of the full package.

Henrique_Becker · December 26, 2020, 4:30pm

I do not think it is useful to try to test a part of a package in a way that is independent of the rest of the package working. If some subset of a package is independent of all the rest, then:

It could be its own package.
It can be an inner module, in a separate file, and you can include and import it in a test set just for it.

However, even if it is an inner module this does not mean it does not import anything from the parent module or sibling modules (an independent piece of code may be in its own file and module, but being in its own file and module does not mean it is an independent piece of code).

Henrique_Becker · December 26, 2020, 4:36pm

I use this same pattern (i.e., the utility/common/shared code is in files which are included in the “outer module”/“entry file”), but I do some extra encapsulation. Many of the included files have a module wrapping all of its code. Consequently, the dependence graph is coded at the top of each file by the import of specific functions and types from sibling modules.

oschulz · December 26, 2020, 4:47pm

Personally, I found Julia’s reliance on include unusual at first for a very modern language - compared to Scala, for example, where the compiler figures dependencies out by itself. But now I’ve come to feel that the “simple” include mechanism actually encourages developers to think more carefully about the structure and inner inter-dependencies of their package(s). I certainly haven’t felt limited by it in any way.

One thing that came to my mind - if Julia should, at some point, become able to pre-compile code in parallel - might we then need a more automatic, graph-based mechanism? I’m not really expert enough regarding the Julia compiler to offer an opinion here, though. However, I would expect that the Julia compiler team has considered this already (and may already have a concept in the drawer for that eventuality?).

tbeason · December 26, 2020, 4:57pm

Code organization is tough stuff.

doomphoenix-qxz · December 26, 2020, 7:38pm

cce:

patrick-kidger:

I’m constructing a module/package/some large blob of code.
I have two files A.jl and B.jl , which depend upon some common functionality. The typical pattern is to factor this out into some other file, in my case often with an unimaginative name like utils.jl .
module MyPackage
    include("utils.jl")
    include("A.jl")
    include("B.jl")
end

Contrary to what is suggested here (and not just by cce but by several others in this thread) the OP is aware of this style of code organization, has examined it carefully, and has decided it is insufficient for his purposes. He explained this in his long post above:

OP has written something that I certainly wished for many times when learning Julia. He has also made a good argument that there’s a real use case for PatModules.jl: When you are developing a large package and you write functionality (like utils.jl) used in several places in your package, but for whatever reason, don’t want to split utils.jl into its own package, and yet you want to be clear where the functions in utils.jl come from so they don’t appear as random names all over your package.

Even if this is not a reasonable thing to want, the OP wanted it and went to the trouble to implement it himself before telling us about it, and is not forcing anyone to use it. Perhaps his initial tone was not optimal, but he has tried to clarify that he’s not trying to insult anyone. The great thing about open source software is that it allows people to write whatever they want, and the great thing about Julia is that it makes this sort of experimentation easier (can you imagine rewriting Python import logic in a package?).

@patrick-kidger I say well done, and time will tell if this package provides a sufficiently elegant solution to a sufficiently common problem that it gains more traction. I wouldn’t give too much weight to the initial reactions here; just keep making PatModules.jl awesome. If it’s awesome enough, people will use it.

PetrKryslUCSD · December 26, 2020, 7:56pm

I think not distinguishing between organization of code into files and organization of code into modules is obscuring the real issues in this discussion. My code is usually divided into modules, with one file per module. I haven’t found any use for multiple files per module or vice versa.

I still don’t see what having the common functionality in the file utils.jl may mean that would necessitate additional structure in addition to what Julia already provides.

Skoffer · December 26, 2020, 10:06pm

I think your example shows what is the real problem here. I think, there are two sides to this problem: those who write long and complicated packages and those who want to somehow extend these packages.First group is happy with the structure “one file that includes everything else and multiple other files which presume that context is already defined” and other group vote that such structure leads to worse discoverability, cause you literally need to refer constantly to this “table of contents” and apply some guess work where what is defined.

Now, regarding your question, developers can already do “discoverable” file structure, for example by writing at the beginning of each file, something like

# used utils: makeunique, normalizename

Note comment line here. This approach solve the issue of discoverability (where functions are coming from) without breaking any code at all, but at the price of extra work of adding these verbose explanations. Of course usual developer wouldn’t go that far and make such a comment.

From this point of view, I think it’s easier to understand why proposed file organization doesn’t look appealing. Instead of adding commented lines of function origin, language or package forces them on developer with only benefit that they may omit big “table of contents” file. It doesn’t look like a fair deal: more work with no real benefits except of better discoverability for possible outside developers.

As an extreme example, imagine that you want to go matlab way and put each function in a separate file. Imagine amount of work to make each and every file self-contained with complete context. And imagine amount of work that you’ll have to do if for some reason you change the name of one of the files…

With that said, problem of discoverability do exists. Maybe it is possible to extend Documenter.jl or some other similar package to build automatically something analogous to sitemap only for package, so you can generate html/md/toml file where each and every function can be traced to the file where it is defined.

MA_Laforge · December 26, 2020, 10:09pm

I can’t be 100% sure because I am not looking at it from the same vantage point as you @patrick-kidger, but your thought pattern & solutions remind me alot of my own confusion when I started making modules/packages/namespaces in Julia

Header guards, as an example

“using header guards” is one of those things that got to me too - coming from a C/C++ background.

C/C++ vs Julia

But there are huge differences with include() in Julia: You don’t include header files in Julia! In C/C++, header files are used to tell the compiler how much space to allocate to objects, where data offsets are, and where to push data on the call stack before executing a function call.

That’s because C/C++ compiles machine code only, and doesn’t retain a table of that information anywhere (At least I don’t think it does when you turn off the debugger options).

Julia, on the other hand keeps this information around when it “compiles” its code (so I guess it does more than just compile). And this is a big difference. It’s also why Julia can do introspection so well (you can even access the original function code if you want).

When to use Julia `include()`s

First of all, you don’t include() external packages. You should only include() files that are in your current project or package (baring exceptional, possibly questionable circumstances). When you want to use external packages, you need to use import or using (but I won’t talk about those right now).

So right there, this is different from C/C++. Whereas in C/C++ you include header files that are built independently of your current project, in Julia, you only should be including code/files that you know are directly part of your project/package.

That means you can guarantee there is no double inclusion because you, as the developer, have control over this entire project/package codebase. Moreover, your code files need only be included once (it is not the same thing as C/C++ headers). That’s why you’ve noticed people often include() a bunch of subfiles from the same master file. Once include()-ed, that code is available to any other module loaded by Julia (There is no information hiding like there is in C/C++).

So how do Julia `include()`-s map to C/C++?

In reality, a Julia include() statement probably relates more closely to using the C/C++ linker:

g++ -o myexecutable first.o second.o third.o

In Julia, this would look like:

#myexecutable.jl

include("first.jl")
include("second.jl")
include("third.jl")
...

That’s because the Julia interpreter/compiler actually loads that code into memory and is ready to use it as soon as it processes the include() statement. It is not a preprocessor directive like it is in C/C++.

So how do you call code from another module, then?

Well, unlike C/C++, you don’t need to read in a header file to execute code, or build new structs in Julia. Once the code is loaded through an include() statement somewhere, your code can just execute it directly as long as it knows what the module path is:

#MyProject/src/subfile1.jl

#Why not define a submodule here? It doesn't have to the same name as the file.
#"module"s are really just namespaces, and are not tied to files in any way.
module SubA

module B #Again, why not another namespace here?
struct MyStruct
    x::Int; y::Int
end
end #module B

function dosomething(obj::B.MyStruct)
    #Do something
end

end #module SubA

#This function will be in the same namespace ("module") as whatever code
#called include("subfile1.jl"):
function dosomethingelse(obj::SubA.B.MyStruct)
    #Do something
end

#MyProject/src/MyProject.jl

#Julia projects & packages need to declare a module (namespace)
#with the same name as the project to function correctly.  Julia also
#expects you to create a file under src/ that has the same name
#as the project (thus /src/MyProject.jl).
module MyProject

include("subfile1.jl")

#Create a global object to store state:
glb_obj = SubA.B.MyStruct(3,5)

#Call functions that might have been written/loaded in other files:

SubA.dosomething(glb_obj)
dosomethingelse(glb_obj)

#...
end #module MyProject

Is that the same way we load external packges?

No, not exactly, code from external packages (even if not registered in Julia’s “General” registry) should be indirectly included with either the using or import statements:

#MyProject/src/MyProject.jl

#Julia projects & packages need to declare a module (namespace)
#with the same name as the project to function correctly.  Julia also
#expects you to create a file under src/ that has the same name
#as the project (thus /src/MyProject.jl).

module MyProject
using CSV

CSV.dosomething() #FYI: Doesn't actually exist.

#...
end #module MyProject

Note that Julia ensures that only a single instance of CSV is loaded - no matter how many modules (namespaces) call “using CSV”. It also ensures that ALL modules calling using CSV get a local pointer to the same CSV module (namespace), which gets loaded exactly once into memory (unless it needs to be re-evaluated for some reason - yeah. there are a bunch of exception cases, sorry.).

MA_Laforge · December 26, 2020, 10:26pm

I suggest looking at the following as well:
→ Dependencies of src files inside a package

I also creating a doc PR to address this issue (I don’t suggest reading the PR thread because it’s a bit hard to read).

Instead, I’ll insert the PR text directly here:

Inline text from PR

Julia ⇔ C/C++: Namespaces

C/C++ namespaces correspond roughly to Julia modules.
There are no private functions/variables/modules/… in Julia. Everthing is accessible
through fully qualified paths (or relative paths, if desired).
using MyNamespace::myfun (C++) corresponds roughly to import MyModule: myfun (Julia).
using namespace MyNamespace (C++) corresponds roughly to using MyModule (Julia)
- In Julia, only exported symbols are made available to the calling module.
- In C++, only elements found in the included (public) header files are made available.
Caveat: import/using keywords (Julia) also load modules (see below).
Caveat: import/using (Julia) works only at the global scope level (modules)
- In C++, using namespace X works within arbitrary scopes (ex: function scope).

Julia ⇔ C/C++: Module loading

When you think of a C/C++ “library”, you are likely looking for a Julia “package”.
- Caveat: C/C++ libraries often house multiple “software modules” whereas Julia
  “packages” typically house one.
- Reminder: Julia modules are global scopes (not necessarily “software modules”).
Instead of build/make scripts, Julia uses “Project Environments” (sometimes called
either “Project” or “Environment”).
- Build scripts are only needed for more complex applications
  (like those needing to compile, or download C/C++ executables ).
- C/C++ code typically target more conventional applications, whereas Julia
  “Project Environments” provide a set of packages to experiment with particular problem
  spaces. Julia users typically use problem-specific “scripts” for this type of experimentation.
- To develop a “conventional” application/project in Julia, you can initialize its root directory
  as a “Project Environment”, and house application-specific code/packages there.
  This provides good control over project dependencies, and future reproducibility.
- Available packages are added to a “Project Environment” with the pkg> add tool
  (This does not load said package, however).
- The list of available packages (direct dependencies) for a “Project Environment” are
  saved in its Project.toml file.
- The full dependency information for a “Project Environment” is auto-generated & saved
  in its Manifest.toml file.
Packages (“software modules”) available to the “Project Environment” are loaded with
import or using.
- In C/C++, you #include <moduleheader> to get object/function delarations, and link in
  libraries when you build the executable.
- In Julia, whatever is loaded is available to all other loaded modules through its
  fully qualified path (no header file required).
- Use import SomePkg: SubModule.SubSubmodule (Julia) to access package submodules.
Directory-based package repositories (Julia) can be made available by adding repository
paths to the Base.LOAD_PATH array.
- Packages from directory-based repositories do not require the pkg> add tool prior to
  being loaded with import or using. They are simply available to the project.
- Directory-based package repositories are the quickest solution to developping local
  libraries of “software modules”.

Julia ⇔ C/C++: Assembling modules

In C/C++, .c/.cpp files are compiled & added to a library with build/make scripts.
- In Julia, import [PkgName]/using [PkgName] statements load [PkgName].jl located
  in a package’s [PkgName]/src/ subdirectory.
- In turn, [PkgName].jl typically loads associated source files with calls to
  include "[someotherfile].jl".
include "./path/to/somefile.jl" (Julia) is very similar to
#include "./path/to/somefile.jl" (C/C++).
- However include "..." (Julia) is not used to include header files (not required).
- Do not use include "..." (Julia) to load code from other “software modules”
  (use import/using instead).
- include "path/to/some/module.jl" (Julia) would instantiate multiple versions of the
  same code in different modules (creating distinct types (etc.) with the same names).
- include "somefile.jl" is typically used to assemble multiple files within the same
  Julia package (“software module”). It is therefore relatively straightforward to ensure
  file are included only once (No #ifdef confusion).

Julia ⇔ C/C++: Module interface

C++ exposes interfaces using “public” .h/.hpp files whereas Julia modules export
symbols that are intended for their users.
- Often, Julia modules simply add functionality by generating new “methods” to existing
  functions (ex: Base.push!).
- Developers of Julia packages therefore cannot rely on header files for interface
  documentation.
- Interfaces for Julia packages are typically described using docstrings, README.md,
  static web pages, …
Some developers choose not to export all symbols required to use their package/module.
- Users might be expected to access these components by qualifying functions/structs/…
  with the package/module name (ex: MyModule.run_this_task(...)).

Julia ⇔ C/C++: Quick reference

Software Concept	Julia	C/C++
unnamed scope	`begin` … `end`	`{` … `}`
function scope	`function x()` … `end`	`int x() {` … `}`
global scope	`module MyMod` … `end`	`namespace MyNS {` … `}`
software module	A Julia “package”	`.h`/`.hpp` files +compiled `somelib.a`
assembling software modules	`SomePkg.jl`: … `import subfile1.jl` `import subfile2.jl` …	`$(AR) *.o` ⇒ `somelib.a`
import software module	`import SomePkg`	`#include <somelib>` +link in `somelib.a`
module library	`LOAD_PATH[]`, Git repository, *custom package registry	more `.h`/`.hpp` files +bigger compiled `somebiglib.a`

* The Julia package manager supports registering multiple packages from a single Git repository.

* This allows users to house a library of related packages in a single repository.

** Julia registries are primarily designed to provide versionning & distribution of packages.

** Custom package registries can be used to create a type of module library.

aplavin · December 27, 2020, 12:46pm

Thanks for sharing your view on this.

Skoffer:

Now, regarding your question, developers can already do “discoverable” file structure, for example by writing at the beginning of each file, something like
# used utils: makeunique, normalizename

Comments can and will easily become obsolete, as they are not checked by the compiler at all. Thus one cannot rely on them.
And I didn’t propose to list all names in every import, just the opposite! Even without such lists it’s pretty obvious that makeunique and normalizename likely lie in .utils, and not in Parsers.

But why would you want to go “matlab way” in julia? Typically there are many short functions, and it’s just crazy to put each into separate file.

Also, my previous message here presented a look from one PoV: as a user who reads package code and wants to understand something there. Below is a view of me a “package developer”.

Take for example a package of mine: Alexander Plavin / SquashFS.jl · GitLab. The source file api.jl contains main functions that are supposed to be called by users - the package public API. E.g. SquashFS.open, SquashFS.readdir, etc. Other files, such as utils.jl and sqfs_structs.jl, contain lower-level functions that are supposed to be internal.
Initially (and currently) I follow the typical/suggested julian approach: don’t make any separate modules within the package. But this means all the internal functions are available as SquashFS.xxxxx and show in autocomplete - confusing users. If every source file was implicitly “module-like” I would just put something like using .api into main SquashFS.jl file. Internal functions would then automatically be accessible as SquashFS.utils.xxxxx.
As this is not the case, I’m thinking of creating a submodule like Internal manually, but it’s not really clear for me where to define it. And as I understand from this thread, submodules within a single package are actually discouraged…

stevengj · December 27, 2020, 1:39pm

They are perfectly fine. Segregating code into lots of submodules may be a bad idea, and submodules that are usable elsewhere might be better as packages, but a single Internal submodule for namespace purposes is not unreasonable.

Just create a file Internal.jl with module Internal ... end that includes your other files. then in your main SquashFS.jl file do include("Internal.jl") and access its names as Internal.foo (or do using .Internal to get any exports).

Topic		Replies	Views
Julia Modules Internals & Design module , code-organization	20	5163	November 17, 2017
Using "Module.jl" as alias to include("Module.jl"); using .Module Internals & Design	4	903	December 30, 2020
Single module vs. submodules in a project New to Julia performance , modules , code-organization	17	5527	February 17, 2021
Module with include New to Julia	2	664	November 1, 2018
How to use local, relative modules (best practice, simple)? New to Julia question , modules , code-organization	15	6855	October 22, 2024