[ANN] PatModules.jl: a better module system for Julia

Personally, I found Julia’s reliance on include unusual at first for a very modern language - compared to Scala, for example, where the compiler figures dependencies out by itself. But now I’ve come to feel that the “simple” include mechanism actually encourages developers to think more carefully about the structure and inner inter-dependencies of their package(s). I certainly haven’t felt limited by it in any way.

One thing that came to my mind - if Julia should, at some point, become able to pre-compile code in parallel - might we then need a more automatic, graph-based mechanism? I’m not really expert enough regarding the Julia compiler to offer an opinion here, though. However, I would expect that the Julia compiler team has considered this already (and may already have a concept in the drawer for that eventuality?).

5 Likes

Code organization is tough stuff.

7 Likes

Contrary to what is suggested here (and not just by cce but by several others in this thread) the OP is aware of this style of code organization, has examined it carefully, and has decided it is insufficient for his purposes. He explained this in his long post above:

OP has written something that I certainly wished for many times when learning Julia. He has also made a good argument that there’s a real use case for PatModules.jl: When you are developing a large package and you write functionality (like utils.jl) used in several places in your package, but for whatever reason, don’t want to split utils.jl into its own package, and yet you want to be clear where the functions in utils.jl come from so they don’t appear as random names all over your package.

Even if this is not a reasonable thing to want, the OP wanted it and went to the trouble to implement it himself before telling us about it, and is not forcing anyone to use it. Perhaps his initial tone was not optimal, but he has tried to clarify that he’s not trying to insult anyone. The great thing about open source software is that it allows people to write whatever they want, and the great thing about Julia is that it makes this sort of experimentation easier (can you imagine rewriting Python import logic in a package?).

@patrick-kidger I say well done, and time will tell if this package provides a sufficiently elegant solution to a sufficiently common problem that it gains more traction. I wouldn’t give too much weight to the initial reactions here; just keep making PatModules.jl awesome. If it’s awesome enough, people will use it.

36 Likes

I think not distinguishing between organization of code into files and organization of code into modules is obscuring the real issues in this discussion. My code is usually divided into modules, with one file per module. I haven’t found any use for multiple files per module or vice versa.

I still don’t see what having the common functionality in the file utils.jl may mean that would necessitate additional structure in addition to what Julia already provides.

4 Likes

I think your example shows what is the real problem here. I think, there are two sides to this problem: those who write long and complicated packages and those who want to somehow extend these packages.First group is happy with the structure “one file that includes everything else and multiple other files which presume that context is already defined” and other group vote that such structure leads to worse discoverability, cause you literally need to refer constantly to this “table of contents” and apply some guess work where what is defined.

Now, regarding your question, developers can already do “discoverable” file structure, for example by writing at the beginning of each file, something like

# used utils: makeunique, normalizename

Note comment line here. This approach solve the issue of discoverability (where functions are coming from) without breaking any code at all, but at the price of extra work of adding these verbose explanations. Of course usual developer wouldn’t go that far and make such a comment.

From this point of view, I think it’s easier to understand why proposed file organization doesn’t look appealing. Instead of adding commented lines of function origin, language or package forces them on developer with only benefit that they may omit big “table of contents” file. It doesn’t look like a fair deal: more work with no real benefits except of better discoverability for possible outside developers.

As an extreme example, imagine that you want to go matlab way and put each function in a separate file. Imagine amount of work to make each and every file self-contained with complete context. And imagine amount of work that you’ll have to do if for some reason you change the name of one of the files…

With that said, problem of discoverability do exists. Maybe it is possible to extend Documenter.jl or some other similar package to build automatically something analogous to sitemap only for package, so you can generate html/md/toml file where each and every function can be traced to the file where it is defined.

1 Like

I can’t be 100% sure because I am not looking at it from the same vantage point as you @patrick-kidger, but your thought pattern & solutions remind me alot of my own confusion when I started making modules/packages/namespaces in Julia

Header guards, as an example

“using header guards” is one of those things that got to me too - coming from a C/C++ background.

C/C++ vs Julia

But there are huge differences with include() in Julia: You don’t include header files in Julia! In C/C++, header files are used to tell the compiler how much space to allocate to objects, where data offsets are, and where to push data on the call stack before executing a function call.

That’s because C/C++ compiles machine code only, and doesn’t retain a table of that information anywhere (At least I don’t think it does when you turn off the debugger options).

Julia, on the other hand keeps this information around when it “compiles” its code (so I guess it does more than just compile). And this is a big difference. It’s also why Julia can do introspection so well (you can even access the original function code if you want).

When to use Julia include()s

First of all, you don’t include() external packages. You should only include() files that are in your current project or package (baring exceptional, possibly questionable circumstances). When you want to use external packages, you need to use import or using (but I won’t talk about those right now).

So right there, this is different from C/C++. Whereas in C/C++ you include header files that are built independently of your current project, in Julia, you only should be including code/files that you know are directly part of your project/package.

That means you can guarantee there is no double inclusion because you, as the developer, have control over this entire project/package codebase. Moreover, your code files need only be included once (it is not the same thing as C/C++ headers). That’s why you’ve noticed people often include() a bunch of subfiles from the same master file. Once include()-ed, that code is available to any other module loaded by Julia (There is no information hiding like there is in C/C++).

So how do Julia include()-s map to C/C++?

In reality, a Julia include() statement probably relates more closely to using the C/C++ linker:

g++ -o myexecutable first.o second.o third.o 

In Julia, this would look like:

#myexecutable.jl

include("first.jl")
include("second.jl")
include("third.jl")
...

That’s because the Julia interpreter/compiler actually loads that code into memory and is ready to use it as soon as it processes the include() statement. It is not a preprocessor directive like it is in C/C++.

So how do you call code from another module, then?

Well, unlike C/C++, you don’t need to read in a header file to execute code, or build new structs in Julia. Once the code is loaded through an include() statement somewhere, your code can just execute it directly as long as it knows what the module path is:

#MyProject/src/subfile1.jl

#Why not define a submodule here? It doesn't have to the same name as the file.
#"module"s are really just namespaces, and are not tied to files in any way.
module SubA

module B #Again, why not another namespace here?
struct MyStruct
    x::Int; y::Int
end
end #module B

function dosomething(obj::B.MyStruct)
    #Do something
end

end #module SubA

#This function will be in the same namespace ("module") as whatever code
#called include("subfile1.jl"):
function dosomethingelse(obj::SubA.B.MyStruct)
    #Do something
end

#MyProject/src/MyProject.jl

#Julia projects & packages need to declare a module (namespace)
#with the same name as the project to function correctly.  Julia also
#expects you to create a file under src/ that has the same name
#as the project (thus /src/MyProject.jl).
module MyProject

include("subfile1.jl")

#Create a global object to store state:
glb_obj = SubA.B.MyStruct(3,5)

#Call functions that might have been written/loaded in other files:

SubA.dosomething(glb_obj)
dosomethingelse(glb_obj)

#...
end #module MyProject

Is that the same way we load external packges?

No, not exactly, code from external packages (even if not registered in Julia’s “General” registry) should be indirectly included with either the using or import statements:

#MyProject/src/MyProject.jl

#Julia projects & packages need to declare a module (namespace)
#with the same name as the project to function correctly.  Julia also
#expects you to create a file under src/ that has the same name
#as the project (thus /src/MyProject.jl).

module MyProject
using CSV

CSV.dosomething() #FYI: Doesn't actually exist.

#...
end #module MyProject

Note that Julia ensures that only a single instance of CSV is loaded - no matter how many modules (namespaces) call “using CSV”. It also ensures that ALL modules calling using CSV get a local pointer to the same CSV module (namespace), which gets loaded exactly once into memory (unless it needs to be re-evaluated for some reason - yeah. there are a bunch of exception cases, sorry.).

9 Likes

I suggest looking at the following as well:
Dependencies of src files inside a package

I also creating a doc PR to address this issue (I don’t suggest reading the PR thread because it’s a bit hard to read).

Instead, I’ll insert the PR text directly here:

Inline text from PR

Julia ⇔ C/C++: Namespaces

  • C/C++ namespaces correspond roughly to Julia modules.
  • There are no private functions/variables/modules/… in Julia. Everthing is accessible
    through fully qualified paths (or relative paths, if desired).
  • using MyNamespace::myfun (C++) corresponds roughly to import MyModule: myfun (Julia).
  • using namespace MyNamespace (C++) corresponds roughly to using MyModule (Julia)
    • In Julia, only exported symbols are made available to the calling module.
    • In C++, only elements found in the included (public) header files are made available.
  • Caveat: import/using keywords (Julia) also load modules (see below).
  • Caveat: import/using (Julia) works only at the global scope level (modules)
    • In C++, using namespace X works within arbitrary scopes (ex: function scope).

Julia ⇔ C/C++: Module loading

  • When you think of a C/C++ “library”, you are likely looking for a Julia “package”.
    • Caveat: C/C++ libraries often house multiple “software modules” whereas Julia
      “packages” typically house one.
    • Reminder: Julia modules are global scopes (not necessarily “software modules”).
  • Instead of build/make scripts, Julia uses “Project Environments” (sometimes called
    either “Project” or “Environment”).
    • Build scripts are only needed for more complex applications
      (like those needing to compile, or download C/C++ executables :slight_smile: ).
    • C/C++ code typically target more conventional applications, whereas Julia
      “Project Environments” provide a set of packages to experiment with particular problem
      spaces. Julia users typically use problem-specific “scripts” for this type of experimentation.
    • To develop a “conventional” application/project in Julia, you can initialize its root directory
      as a “Project Environment”, and house application-specific code/packages there.
      This provides good control over project dependencies, and future reproducibility.
    • Available packages are added to a “Project Environment” with the pkg> add tool
      (This does not load said package, however).
    • The list of available packages (direct dependencies) for a “Project Environment” are
      saved in its Project.toml file.
    • The full dependency information for a “Project Environment” is auto-generated & saved
      in its Manifest.toml file.
  • Packages (“software modules”) available to the “Project Environment” are loaded with
    import or using.
    • In C/C++, you #include <moduleheader> to get object/function delarations, and link in
      libraries when you build the executable.
    • In Julia, whatever is loaded is available to all other loaded modules through its
      fully qualified path (no header file required).
    • Use import SomePkg: SubModule.SubSubmodule (Julia) to access package submodules.
  • Directory-based package repositories (Julia) can be made available by adding repository
    paths to the Base.LOAD_PATH array.
    • Packages from directory-based repositories do not require the pkg> add tool prior to
      being loaded with import or using. They are simply available to the project.
    • Directory-based package repositories are the quickest solution to developping local
      libraries of “software modules”.

Julia ⇔ C/C++: Assembling modules

  • In C/C++, .c/.cpp files are compiled & added to a library with build/make scripts.
    • In Julia, import [PkgName]/using [PkgName] statements load [PkgName].jl located
      in a package’s [PkgName]/src/ subdirectory.
    • In turn, [PkgName].jl typically loads associated source files with calls to
      include "[someotherfile].jl".
  • include "./path/to/somefile.jl" (Julia) is very similar to
    #include "./path/to/somefile.jl" (C/C++).
    • However include "..." (Julia) is not used to include header files (not required).
    • Do not use include "..." (Julia) to load code from other “software modules”
      (use import/using instead).
    • include "path/to/some/module.jl" (Julia) would instantiate multiple versions of the
      same code in different modules (creating distinct types (etc.) with the same names).
    • include "somefile.jl" is typically used to assemble multiple files within the same
      Julia package
      (“software module”). It is therefore relatively straightforward to ensure
      file are included only once (No #ifdef confusion).

Julia ⇔ C/C++: Module interface

  • C++ exposes interfaces using “public” .h/.hpp files whereas Julia modules export
    symbols that are intended for their users.
    • Often, Julia modules simply add functionality by generating new “methods” to existing
      functions (ex: Base.push!).
    • Developers of Julia packages therefore cannot rely on header files for interface
      documentation.
    • Interfaces for Julia packages are typically described using docstrings, README.md,
      static web pages, …
  • Some developers choose not to export all symbols required to use their package/module.
    • Users might be expected to access these components by qualifying functions/structs/…
      with the package/module name (ex: MyModule.run_this_task(...)).

Julia ⇔ C/C++: Quick reference

Software Concept Julia C/C++
unnamed scope beginend {}
function scope function x()end int x() {}
global scope module MyModend namespace MyNS {}
software module A Julia “package” .h/.hpp files
+compiled somelib.a
assembling
software modules
SomePkg.jl: …
import subfile1.jl
import subfile2.jl
$(AR) *.osomelib.a
import
software module
import SomePkg #include <somelib>
+link in somelib.a
module library LOAD_PATH[], *Git repository,
**custom package registry
more .h/.hpp files
+bigger compiled somebiglib.a

* The Julia package manager supports registering multiple packages from a single Git repository.

* This allows users to house a library of related packages in a single repository.

** Julia registries are primarily designed to provide versionning & distribution of packages.

** Custom package registries can be used to create a type of module library.

9 Likes

Thanks for sharing your view on this.

Comments can and will easily become obsolete, as they are not checked by the compiler at all. Thus one cannot rely on them.
And I didn’t propose to list all names in every import, just the opposite! Even without such lists it’s pretty obvious that makeunique and normalizename likely lie in .utils, and not in Parsers.

But why would you want to go “matlab way” in julia? Typically there are many short functions, and it’s just crazy to put each into separate file.

Also, my previous message here presented a look from one PoV: as a user who reads package code and wants to understand something there. Below is a view of me a “package developer”.

Take for example a package of mine: Alexander Plavin / SquashFS.jl · GitLab. The source file api.jl contains main functions that are supposed to be called by users - the package public API. E.g. SquashFS.open, SquashFS.readdir, etc. Other files, such as utils.jl and sqfs_structs.jl, contain lower-level functions that are supposed to be internal.
Initially (and currently) I follow the typical/suggested julian approach: don’t make any separate modules within the package. But this means all the internal functions are available as SquashFS.xxxxx and show in autocomplete - confusing users. If every source file was implicitly “module-like” I would just put something like using .api into main SquashFS.jl file. Internal functions would then automatically be accessible as SquashFS.utils.xxxxx.
As this is not the case, I’m thinking of creating a submodule like Internal manually, but it’s not really clear for me where to define it. And as I understand from this thread, submodules within a single package are actually discouraged…

They are perfectly fine. Segregating code into lots of submodules may be a bad idea, and submodules that are usable elsewhere might be better as packages, but a single Internal submodule for namespace purposes is not unreasonable.

Just create a file Internal.jl with module Internal ... end that includes your other files. then in your main SquashFS.jl file do include("Internal.jl") and access its names as Internal.foo (or do using .Internal to get any exports).

9 Likes

See for example in the TOML stdlib:

5 Likes

An alternative way to mark private functions is to prepend an underscore (e.g. _my_private_function(x)). This convention is widely used in Python and partly also in Julia.

3 Likes

The issue is not double inclusion, but the inclusion order. This can cause a lot mental burden and should be handled by the compiler since the compiler knows which module depends on which, e.g when you have

include("B.jl")
include("A.jl")

where module B depends on A, manual include will result in undefined error, but if we implement issue 4600, then we can just write something like (the syntax is not decided yet)

using .B from "B.jl"
using .A from "A.jl"

or

using .A from "A.jl"
using .B from "B.jl"

the order does not matter anymore, since the compiler will evaluate module A when loading B. The include order problem is something the programmer should not and do not need to care about - the compiler has sufficient information to infer that, now using manual include just increases the programmers’ burden.


I don’t think relative module loading is something that is not encouraged, and in fact I believe file loading should be handled by compiler, the programmers should not take care about the orders and dependencies of files themselves - if it’s something the machine can easily do why are we humans doing it?

6 Likes
    module Printer
        include("print.jl")
    end

I’m curious: why are the module and end lines in the (what I’ll call the) ‘parent’ file rather than the first and last lines of print.jl?

Also just want to say that I’m learning a lot reading this thread!

1 Like

No good reason really. I think it is mostly because the original file came from somewhere else and I wanted to minimize the diff against that. But it doesn’t really make sense, you are right.

2 Likes

I never had a problem with what PatModules is trying to solve (I’ve used the utils.jl submodule pattern mentioned above to good effect), but I think there are some larger themes here that were mentioned in passing that deserve attention:

But in a math book I will often refer to other parts. When an important Lemma is used a book will tell me “using Lemma 4.5.1 (Abraham’s Lemma)” where 4.5.1 will mean Chapter 4, section 5 first Lemma, enabling me to quickly glance back. It would be a disaster if a book would just use names and assume I have memorized everything that has come before. Further, glancing back I can check what the assumptions and result of Lemma 4.5.1 are without having to read or understand the proof. I can build on it without even looking at the internals. Which is what @Skoffer mentions here:

To me the single largest issue with reading (and designing) Julia code is that I don’t know (can’t specify) what the assumptions the author made are. This includes understanding what context the code I am looking at will actually run in. Existing tools do not solve this (short of reaching for the debugger and actually running the code). Due to highly generic code, and due to assumptions being encoded in documentation only (often slightly wrong/out of date, as documentation is hard to test) the IDE tools will often lack the context to jump to the right method when I ask it to jump to the definition of a function used.

This is not really an issue if I am working deeply on a package. Then I will read the code start to finish and have a good working memory of the overall layout and where to get the info I need. Even then I believe Julia could do a lot better at supporting people in structuring code well, but in this context it’s doable. But if I am building on top of a 400k line universe of code that is highly modular and intricate, then that is not a feasible strategy.

The specific combination of features of Julia has brought amazing things that are completely inconceivable in other languages I know. But they do pose real challenges, too.

A final analogy: At one point “be a better and more disciplined developer” was considered an appropriate rebuke to people objection to C-style memory management. I feel like this is where we are at with code organization/structure in Julia right now.

A happy new year to you all!

7 Likes

Highly generic code has its own issues that I don’t think can be captured in the code. The way people try to capture it in the code would be like Haskell’s type system, but there are plenty of examples which would immediately fail if trying to use derived types (units is always one tricky example which breaks simple things like f(u::T,p,t)::T). So anything in the code would only break examples, other than the code itself (i.e. looking at it and saying “the scalars should support sqrt, …”). I’m not sure there’s a better solution than mass combinatoric testing, other than imposing an interface that locks out otherwise compatible types.

3 Likes

Why would an interface lock out compatible types? You need a set of methods anyway, so just have a traits that tags a type or value as implementing those methods

See the example. Common descriptions of interfaces usually don’t just lay out what methods are required but also have restrictions on the types in the methods. But that can be tricky. You may think that f(u::T,p,t)::T is required for u, and it needs sqrt(u::T)::T, etc. but then units are the counter example: sqrt(u) when you have m^2 returns m, and in an ODE f(u,p,t) should output something with units u/t because it’s a rate. So the details in some cases on what can work in terms of types is very difficult or impossible to explain in generality, meaning that if you don’t want to lock useful cases out, then you want to just leave it at functions.

But you don’t want to mention all functions (Base.broadcasted(…)), so you instead try to document it at a very course “it should support sqrt, *, + on the element types”, which of course has some extra undocumented assumptions on type calculations and properties under broadcast and … It can go pretty deep too, like you can say “should be an AbstractArray”, but in reality things that broadcast don’t need to be an AbstractArray, and implementing broadcast is separate from implementing “indexing” (example: CuArrays). So what it comes down to is that with every reasonable description of the interface I’ve thought of, there are useful edge cases which violate it, meaning that documentation of interfaces are more of a heuristic than a hard boundary.

That’s continuing to improve in SciML as we build out:

And solidify higher order properties like “whether A has fast scalar indexing”, but it’s still in progress.

11 Likes

Wow yes, this is much more involved than I had assumed. Thanks for laying it out

Thanks for the detailed explanaition @ChrisRackauckas on what you’ve thought about on this front. C++ Concepts are the solution the C++ standard committee arrived at. They are carefully designed so that you can retrofit them to a library without breaking any existing uses of the library.

I have to disagree with your example, though. If my code assumes that a square root with this type signature exists sqrt(u::T)::T then type based unit systems will break my code, and attempting to use units with my code will provide a sane error message. If my code doesn’t assume it I have provided an incorrect interface (so open an issue and we can fix it). Overly conservative type constraints can be an issue in some languages, but seeing how far Julia errs on the other side I doubt it would be to bad…

The whole situation with AbstractArray, which is an abstract type but should be an interface (e.g. has_index) exactly demonstrates this specific problem of Julias design, and any new ideas need to work around this (and especially the problem that it means that if you use AbstractArray then nothing else can be encoded at the AbstractType level…).

3 Likes