Implicitly loaded modules in the future?

I completely agree with you that this debate has been conflating many important questions about code loading, dependency resolution and module systems. These questions are not completely orthogonal though as the current code loading mechanism dictates a lot of what people can and cannot do with modules for example.

This point you are making that multiple dispatch works better with flat namespaces is very interesting and it gets me seriously thinking. I am not fully convinced that a conflict between multiple dispatch and submodules is inevitable but this should definitely be a topic for further debate.

Still, it remains that submodules are used by people because they fulfill a need and we should not be satisfied with dismissing them as an antipattern without offering alternatives. Concretely, I would be genuinely curious about better ways to organize AlphaZero.jl. I cannot see any better way under the current status quo and this is one reason some of the proposals that are being made are appealing to me.

Fortunately, there are many ways to enforce clearer internal dependencies without falling into the excesses of Java. I can promise you that I am as invested as you are into not turning Julia into a Java clone. :slight_smile:

6 Likes

It sounds like you’re implying that the people advocating for the status quo only write research code…

This is an error on my side. Please allow me to edit my original message.

2 Likes

This software is amazing! But since you asked: I don’t think you are using modules to their full advantage. For instance I found that having to explicitly state which quantities are being used in a module is invaluable in making the code in the module self contained. That is helpful in making the code more legible and more easily understandable. When you have files without modules which are included who knows where, the functionality used in that file is wide open to outside influences. Hence, the code inside the file is harder to understand.

4 Likes

I have no idea how feasible this is, but it would be cool if a future version of Julia allowed referring to types that haven’t been defined yet. It would be especially cool if this allowed defining mutually recursive types.

If we had this feature, then all of the code in a package could be declarative. (And there could be an option for an __init__.jl file that runs any imperative code that is needed after all of the declarative code has been loaded). This probably wouldn’t improve the discoverability of code dependencies, but then you wouldn’t have to worry about manually ordering include statements anymore. Any order of include statements would work.

On the other hand, this could make discovering the code dependencies by inspection even harder… :joy:

4 Likes

This is of course a fair point in general. In this argument, I really tried to abstract myself from language-specific debates and what I call “good software engineering practices” could be summarized as dividing a package into a lot of small, lowly-coupled semantic units with local definitions and a clear dependency graph. It is my opinion that the current status quo does not make this as easy and natural as it should be.

5 Likes

It is a pity the issue https://github.com/JuliaLang/julia/issues/4600 features no proper discussion. Any dissenting voice is slapped down (from about a week ago). As @jonathan-laurent pointed out, it would be a good discussion to have. Obviously, with all opinions heard.

Edited: The time allusion was wrong. My sincere apologies to the contributors to this thread.

This seems to be a big component of the trepidation for allowing non-include based (e.g. FromFile style, or whatever syntactic construct shakes out of #4600) code loading. I think it’s good to look at how other kindred languages (i.e. not Java and Python) handle it.

Ruby is perhaps the closest analogue to Julia. require is essentially Julia’s include, and from what I’ve seen Gems favour pulling require statements closer to the top level where applicable. However, Ruby also has include, which functions much like the proposed syntax in #4600. I should note that Ruby allows for re-definition of classes as well.

Has that lead to a phenomenon of catastrophic nesting? Well, if we take a look at a large and mature Ruby codebase (GitLab), almost all files come in at <= 3 levels of folder nesting and <= 2 levels of module nesting. I looked at the first 5 large-ish Julia projects that came to mind and found <= 2 levels of folder nesting with <= 2 levels of module nesting.

Moving to languages with multiple dispatch, Common Lisp supports code loading via load (much like include) and require (much like using/import, but will load code if it isn’t already). For comparison, I pulled up the two largest projects I could think of (Clasp and SBCL). Here there was <= 2 levels of folder nesting and <= 1 level of module nesting.

Lastly, Clojure is an interesting example because it also has multiple dispatch and enforces a linear include order, but works with the dreaded java namespace system. I looked at 3 projects I knew to be of reasonable complexity (Figwheel, Ring and Neanderthal, representing frontend/tooling, backend and scientific computing respectively). Despite liberal use of defmethod, there were <= 5 levels or folder nesting and <= 4 levels of namespace nesting.

So in conclusion, allowing for code loading facilities more complex than include has not resulted in a modulepocalypse in well-liked, general-purpose languages (i.e. not Matlab) which share some common design philosophy with Julia. Given this, I think it falls on those who do not want any changes to the status quo to provide evidence of (non-contrived) scenarios where allowing something like #4600 would have an outsized negative impact.

10 Likes

And conversely, for the proponents to show where the current mechanism fails so badly that something else is needed. And to demonstrate that the benefits outweigh yet another layer of complexity.

4 Likes

Well, imo you should use submodules when the “surface area” of the API the module provides is small. In that case you hide a bunch of implementation from other users of that submodule. If you have to export everything anyway, then yes, using a submodule didn’t really buy you anything. I think Documenter.jl does a good job with submodules.

13 Likes

Just a point of clarification… I’m not certain which of the following corresponds to 3 levels of folder nesting:

src/Folder1/Folder2/file.jl
src/Folder1/Folder2/Folder3/file.jl
src/Folder1/Folder2/Folder3/Folder4/file.jl

It seems like either one of those could be considered “3 levels of folder nesting”, depending on how we define folder nesting. :slight_smile:

If we could (edit: we can :slight_smile:) do this I posted in this other thread, at the cost of having a “directory->module” correspondence and a Project.toml file for each module, that could be achieved without any new language feature (and from what I see adding the benefit of expanding the possible uses of each module beyond the limited scope of one package:

2 Likes

I agree with @CameronBieganek – one shouldn’t cargo-cult best practices, it’s important find out what works best for Julia. “People coming from Python find this unintuitive” is somewhat persuasive based on Julia user demographics & trends, but a more objective reason would be more compelling.

However, @jonathan-laurent has expressed a pretty convincing downside to the include pattern, IMO. The key issue is this awkward ‘M’-shaped access pattern that one has to go through when trying to wrap one’s head around a new codebase. For instance, my first experience with a new code base is when I run a tutorial and I manage to cause an error. When I start debugging, my entry point is generally inside some included file, so I have to infer where that got included, hop up to that level, then hop down again into some other included file, …, to do reaching definition analysis. This seems to be objectively somewhat more difficult than one would like.

I’m trying to understand why include seems to break my expectations or intuition for how name resolution works. I think it boils down to the fact that every file in Python also defines a namespace effectively, if I always use import and never exec. The equivalent in Julia would be putting all the included code within modules inside the files, IIUC.

Here’s a concrete example of what bugs me: suppose I put this in “MyCalc.jl”

function calculate(x)
   x*subcalc(x)
end

Then I fire up Julia:

>subcalc(x) = x^2
>include("MyCalc.jl")
>calculate(2)
8

I’m used to the equivalent of this being an error in Python. It would look like:

def calculate(x):
   return x*subcalc(x)

and then

>from calculate import calculate
>def subcalc(2)
>      return x**2
>calculate(2)
  File "/Users/.../calculate.py", line 10, in calculate
    return x*subcalc(x)

NameError: name 'subcalc' is not defined

I like the discipline this provides. I have three ways to approach this in Python:

  1. Put the two definitions in the same file, when they belong together.
  2. Ensure that I import the definition of subcalc into calculate.py so it gets closed over, when subcalc needs to be shared by other modules.
  3. Explicit dependency injection: redefine calculate to take subcalc as a function-type argument, when I want to dynamically change how calculate works

What Python doesn’t let me do (w/o exec) is what one might call “implicit dependency injection:” where subcalc is left undefined inside calculate.py and then finds its way in, from the importing context, through the back door. This seems pathological to me, from the perspective about reasoning about the code locally, but maybe it’s b/c I’m used to thinking in Blub and haven’t really grokked the implications of multiple dispatch yet.

Three questions I have for more experienced Julia developers:

  • Are there other use-case where that “implicit dependency injection” really comes in handy, beyond package structuring?
  • For the specific case of package structuring, am I correct in summarizing the arguments against each of the 3 options as:
  1. Putting all the code in one file is unergonomic
  2. Putting the code into modules inside each file means boilerplate headers and makes refactoring difficult
  3. Explicit dependency injection is awkward, because you’d just end up writing a wrapper in calculate.jl to close over subcalc?
  • Are there situations where you would want calculate to use use multiple definitions of subcalc? (Maybe the answer is ‘always, b/c multiple dispatch’? But I’m not sure if having multiple methods is different from having multiple functions…not a meaningful distinction in Python anyway!)

My gut is telling me that this “implicit dependency injection” capability is a recipe for bugs, because it’s too powerful for the purpose of structuring packages – it seems like you could play some really counterintuitive tricks with it, but it’s intended use case here seems like a small subset of what it could actually be used to do.
[Edit: I can’t count…]

4 Likes

There’s no magical “implicit dependency injection” going on here. All that’s happening is that the code for a single module is split between more than one file. If you were to concatenate the files and include only the concatenated file in the module, the results would be exactly the same.

To use your example, the following two scenarios are exactly equivalent:

Scenario 1

module A

function calculate(x)
   x*subcalc(x)
end

subcalc(x) = 2x

end

Scenario 2

# file A.jl
module A

include("calculate.jl")
include("subcalc.jl")

end
# file calculate.jl
function calculate(x)
   x*subcalc(x)
end
# file subcalc.jl
subcalc(x) = 2x
5 Likes

Does anyone ever write conditional include statements? Or nested includes? Or including the same thing in multiple places? If I knew I could assume that a given file was always included exactly once, in the same place, then I would feel like it was equivalent to just copy/pasting the code in. My guess is that people probably don’t do crazy stuff (at least not intentionally!) but my point is that from within the file, there’s no way to tell that it won’t happen. I suppose also that the indirection of the package management system means that people won’t be reaching out to include something from inside a package they’re importing, and that the package wraps the included code in a module as well, so it’s not just getting dropped “bare” into someone else’s code when they import the package. I think you’re telling me it’s safe to make assumptions I wouldn’t a-priori know that I could make.

I imagine a response to my concern might be: “A bare file like this isn’t meaningful until it is included somewhere - it’s just a half-finished thought. There’s no point trying to reason about it in isolation – stop trying to take that perspective, and just look at the system from the top, where the include statement happens. Everything makes sense this way.”

I have two responses to that hypothetical:

  • If there is a mechanism in Julia that ensures any given file is included once and only once, then I’m happy, b/c that mechanism can also be used mechanically to locate where this include is happening and take me to it deterministically.
  • If included files can’t be understood in isolation, then they shouldn’t be isolated.
2 Likes

Could someone say why it’s useful to have a single module split between multiple files? If a module is a single logical unit, what’s the point of using multiple files? If the other files aren’t part of the same logical unit, I can use multiple submodules.

1 Like

I’m a person who used Python heavily and just learn some C++ before using Julia.

To me, Python is a world where you have file/folder as module and package, and import it if you need to use it, no matter it’s your code or others package, you know where the code is coming from within every single file.
People from python is getting used to having A.B in A, so there is something else in A would use A.B.

And C++ people is kind of #include everything, it feels like copy and paste everything into a single file and compile it. There are just files and includes.
Julia’s system is looking similar to C++20’s module system, which is not used by C++ world yet. Plus a package system.

In practice, after reading several matured Julia project, I feel like it’s more of old C++ way for most Julia packages. People just don’t use submodule, their package just has one single module which include all jl files within it. They keep things simple within a package so that any complicated dependencies problem can be handled by Pkg.jl.

Even when they do use submodule, like there is CUBLAS within CUDA.jl, it’s not like how people from python would think of, where they have something in CUDA.jl which require the submodule CUBLAS, but actually the submodule CUBLAS is quite independent, which using …CUDA, and CUDA would include CUBLAS and export it.
It’s like there is A.B, but it’s just feels like a independent package(actually a module) written in A, but A doesn’t use B at all, it just export it. And everything that is used in A are just files getting included.

This how people’s think A.B is where the source of the confusing for people coming from python.
For example python people would put utils.jl as a module, and it’s feels very bad to have to include("utils.jl") and then using .Utils, and there maybe several places which use it, so there will be files been included several times.
But every package I saw for Julia, they just have utils.jl as file, not as module, and include(“util.jl”) at the top of the main module.

To me I feel like there is some redundancy to have file/module/package as 3 independent thing, it gives some people freedom but it gives others confusing. And in real world Julia packages, you don’t really see people using module concept efficiently. It’s more of people use files within package, and use package for real dependency problem.

I would suggest to clarify this mindset difference in the noteworthy-differences part of docs, where there is just one single line mentioned The logical Julia program structure (Packages and Modules) is independent of the file strucutre (include for additional files), whereas the Python code structure is defined by directories (Packages) and files (Modules).
People from python don’t understand what it really means for package/file/module as independent things, and how to manage the code in such setting.

18 Likes

Yes, but it’s rather uncommon. I’ve mostly seen it and used it myself in conjunction with the Requires package. Otherwise it should mostly be conditioned on platform or availability of external dependencies.

Or nested includes?

This is somewhat common for large packages. Especially if submodules are used it’s kinda natural to let those handle the includes of their own files.

Or including the same thing in multiple places?

This is rare but I’ve seen it in the wild. I’m not convinced it has ever been a good design though.

I suppose also that the indirection of the package management system means that people won’t be reaching out to include something from inside a package they’re importing

I’ve never seen this and it would be a very questionable thing to do.

3 Likes

Even if various parts form a coherent whole, it might make sense to split code to multiple files for easier code navigation (ie no distractions from other parts when you are working on something).

This is of course a matter of taste, people should do what they like.

8 Likes

For overview and to some extent it can reduce version control conflicts when multiple people are working on the code.

Anecdotally I recently inherited some Python code at work which included a 1600 line class. It didn’t do anything egregious and outsourced a lot of its functionality to other files. It was simply big and had somewhat ambitious docstrings. It was, however, very difficult to get an overview of the code. Eventually I expect I can refactor it to be more manageable but short term I’m rather missing the option to split it into 3 or 4 files (based on sub-functionality) and just include the pieces to stitch them together.

9 Likes

You’re thinking in Blub :wink:

Files have no meaning in the structure of a Julia program, (besides the main PackageName.jl). They are just a way of breaking up code, because 5000 line files are hard to grok, and annoying to use with git.

If your subcalc method is defined or imported anywhere in a module then it is available everywhere in the module. There is no “back door” because all the files included in a module are in the same house already. And there is no “implicit dependency injection” happening.

It does work how you want if you use modules, rather than just files. But mostly we don’t do that because 1. we just reuse the same method names everywhere and have few name space collisions. 2. packages are the level of modularity. To illustrate, I have a package dependency depth of five packages that I maintain in some places. But a maximum module depth of 1 inside any of them.

I suppose also that the indirection of the package management system means that people won’t be reaching out to include something from inside a package they’re importing, and that the package wraps the included code in a module as well, so it’s not just getting dropped “bare” into someone else’s code when they import the package. I think you’re telling me it’s safe to make assumptions I wouldn’t a-priori know that I could make.

Additionally, you can make this assumption a-priori. A file is only dropped bare into your code when you manually call include. import imports a module, not raw code. Files and modules are not related.

10 Likes