Implicitly loaded modules in the future?

It is a pity the issue https://github.com/JuliaLang/julia/issues/4600 features no proper discussion. Any dissenting voice is slapped down (from about a week ago). As @jonathan-laurent pointed out, it would be a good discussion to have. Obviously, with all opinions heard.

Edited: The time allusion was wrong. My sincere apologies to the contributors to this thread.

This seems to be a big component of the trepidation for allowing non-include based (e.g. FromFile style, or whatever syntactic construct shakes out of #4600) code loading. I think it’s good to look at how other kindred languages (i.e. not Java and Python) handle it.

Ruby is perhaps the closest analogue to Julia. require is essentially Julia’s include, and from what I’ve seen Gems favour pulling require statements closer to the top level where applicable. However, Ruby also has include, which functions much like the proposed syntax in #4600. I should note that Ruby allows for re-definition of classes as well.

Has that lead to a phenomenon of catastrophic nesting? Well, if we take a look at a large and mature Ruby codebase (GitLab), almost all files come in at <= 3 levels of folder nesting and <= 2 levels of module nesting. I looked at the first 5 large-ish Julia projects that came to mind and found <= 2 levels of folder nesting with <= 2 levels of module nesting.

Moving to languages with multiple dispatch, Common Lisp supports code loading via load (much like include) and require (much like using/import, but will load code if it isn’t already). For comparison, I pulled up the two largest projects I could think of (Clasp and SBCL). Here there was <= 2 levels of folder nesting and <= 1 level of module nesting.

Lastly, Clojure is an interesting example because it also has multiple dispatch and enforces a linear include order, but works with the dreaded java namespace system. I looked at 3 projects I knew to be of reasonable complexity (Figwheel, Ring and Neanderthal, representing frontend/tooling, backend and scientific computing respectively). Despite liberal use of defmethod, there were <= 5 levels or folder nesting and <= 4 levels of namespace nesting.

So in conclusion, allowing for code loading facilities more complex than include has not resulted in a modulepocalypse in well-liked, general-purpose languages (i.e. not Matlab) which share some common design philosophy with Julia. Given this, I think it falls on those who do not want any changes to the status quo to provide evidence of (non-contrived) scenarios where allowing something like #4600 would have an outsized negative impact.

10 Likes

And conversely, for the proponents to show where the current mechanism fails so badly that something else is needed. And to demonstrate that the benefits outweigh yet another layer of complexity.

4 Likes

Well, imo you should use submodules when the “surface area” of the API the module provides is small. In that case you hide a bunch of implementation from other users of that submodule. If you have to export everything anyway, then yes, using a submodule didn’t really buy you anything. I think Documenter.jl does a good job with submodules.

13 Likes

Just a point of clarification… I’m not certain which of the following corresponds to 3 levels of folder nesting:

src/Folder1/Folder2/file.jl
src/Folder1/Folder2/Folder3/file.jl
src/Folder1/Folder2/Folder3/Folder4/file.jl

It seems like either one of those could be considered “3 levels of folder nesting”, depending on how we define folder nesting. :slight_smile:

If we could (edit: we can :slight_smile:) do this I posted in this other thread, at the cost of having a “directory->module” correspondence and a Project.toml file for each module, that could be achieved without any new language feature (and from what I see adding the benefit of expanding the possible uses of each module beyond the limited scope of one package:

2 Likes

I agree with @CameronBieganek – one shouldn’t cargo-cult best practices, it’s important find out what works best for Julia. “People coming from Python find this unintuitive” is somewhat persuasive based on Julia user demographics & trends, but a more objective reason would be more compelling.

However, @jonathan-laurent has expressed a pretty convincing downside to the include pattern, IMO. The key issue is this awkward ‘M’-shaped access pattern that one has to go through when trying to wrap one’s head around a new codebase. For instance, my first experience with a new code base is when I run a tutorial and I manage to cause an error. When I start debugging, my entry point is generally inside some included file, so I have to infer where that got included, hop up to that level, then hop down again into some other included file, …, to do reaching definition analysis. This seems to be objectively somewhat more difficult than one would like.

I’m trying to understand why include seems to break my expectations or intuition for how name resolution works. I think it boils down to the fact that every file in Python also defines a namespace effectively, if I always use import and never exec. The equivalent in Julia would be putting all the included code within modules inside the files, IIUC.

Here’s a concrete example of what bugs me: suppose I put this in “MyCalc.jl”

function calculate(x)
   x*subcalc(x)
end

Then I fire up Julia:

>subcalc(x) = x^2
>include("MyCalc.jl")
>calculate(2)
8

I’m used to the equivalent of this being an error in Python. It would look like:

def calculate(x):
   return x*subcalc(x)

and then

>from calculate import calculate
>def subcalc(2)
>      return x**2
>calculate(2)
  File "/Users/.../calculate.py", line 10, in calculate
    return x*subcalc(x)

NameError: name 'subcalc' is not defined

I like the discipline this provides. I have three ways to approach this in Python:

  1. Put the two definitions in the same file, when they belong together.
  2. Ensure that I import the definition of subcalc into calculate.py so it gets closed over, when subcalc needs to be shared by other modules.
  3. Explicit dependency injection: redefine calculate to take subcalc as a function-type argument, when I want to dynamically change how calculate works

What Python doesn’t let me do (w/o exec) is what one might call “implicit dependency injection:” where subcalc is left undefined inside calculate.py and then finds its way in, from the importing context, through the back door. This seems pathological to me, from the perspective about reasoning about the code locally, but maybe it’s b/c I’m used to thinking in Blub and haven’t really grokked the implications of multiple dispatch yet.

Three questions I have for more experienced Julia developers:

  • Are there other use-case where that “implicit dependency injection” really comes in handy, beyond package structuring?
  • For the specific case of package structuring, am I correct in summarizing the arguments against each of the 3 options as:
  1. Putting all the code in one file is unergonomic
  2. Putting the code into modules inside each file means boilerplate headers and makes refactoring difficult
  3. Explicit dependency injection is awkward, because you’d just end up writing a wrapper in calculate.jl to close over subcalc?
  • Are there situations where you would want calculate to use use multiple definitions of subcalc? (Maybe the answer is ‘always, b/c multiple dispatch’? But I’m not sure if having multiple methods is different from having multiple functions…not a meaningful distinction in Python anyway!)

My gut is telling me that this “implicit dependency injection” capability is a recipe for bugs, because it’s too powerful for the purpose of structuring packages – it seems like you could play some really counterintuitive tricks with it, but it’s intended use case here seems like a small subset of what it could actually be used to do.
[Edit: I can’t count…]

4 Likes

There’s no magical “implicit dependency injection” going on here. All that’s happening is that the code for a single module is split between more than one file. If you were to concatenate the files and include only the concatenated file in the module, the results would be exactly the same.

To use your example, the following two scenarios are exactly equivalent:

Scenario 1

module A

function calculate(x)
   x*subcalc(x)
end

subcalc(x) = 2x

end

Scenario 2

# file A.jl
module A

include("calculate.jl")
include("subcalc.jl")

end
# file calculate.jl
function calculate(x)
   x*subcalc(x)
end
# file subcalc.jl
subcalc(x) = 2x
5 Likes

Does anyone ever write conditional include statements? Or nested includes? Or including the same thing in multiple places? If I knew I could assume that a given file was always included exactly once, in the same place, then I would feel like it was equivalent to just copy/pasting the code in. My guess is that people probably don’t do crazy stuff (at least not intentionally!) but my point is that from within the file, there’s no way to tell that it won’t happen. I suppose also that the indirection of the package management system means that people won’t be reaching out to include something from inside a package they’re importing, and that the package wraps the included code in a module as well, so it’s not just getting dropped “bare” into someone else’s code when they import the package. I think you’re telling me it’s safe to make assumptions I wouldn’t a-priori know that I could make.

I imagine a response to my concern might be: “A bare file like this isn’t meaningful until it is included somewhere - it’s just a half-finished thought. There’s no point trying to reason about it in isolation – stop trying to take that perspective, and just look at the system from the top, where the include statement happens. Everything makes sense this way.”

I have two responses to that hypothetical:

  • If there is a mechanism in Julia that ensures any given file is included once and only once, then I’m happy, b/c that mechanism can also be used mechanically to locate where this include is happening and take me to it deterministically.
  • If included files can’t be understood in isolation, then they shouldn’t be isolated.
2 Likes

Could someone say why it’s useful to have a single module split between multiple files? If a module is a single logical unit, what’s the point of using multiple files? If the other files aren’t part of the same logical unit, I can use multiple submodules.

1 Like

I’m a person who used Python heavily and just learn some C++ before using Julia.

To me, Python is a world where you have file/folder as module and package, and import it if you need to use it, no matter it’s your code or others package, you know where the code is coming from within every single file.
People from python is getting used to having A.B in A, so there is something else in A would use A.B.

And C++ people is kind of #include everything, it feels like copy and paste everything into a single file and compile it. There are just files and includes.
Julia’s system is looking similar to C++20’s module system, which is not used by C++ world yet. Plus a package system.

In practice, after reading several matured Julia project, I feel like it’s more of old C++ way for most Julia packages. People just don’t use submodule, their package just has one single module which include all jl files within it. They keep things simple within a package so that any complicated dependencies problem can be handled by Pkg.jl.

Even when they do use submodule, like there is CUBLAS within CUDA.jl, it’s not like how people from python would think of, where they have something in CUDA.jl which require the submodule CUBLAS, but actually the submodule CUBLAS is quite independent, which using …CUDA, and CUDA would include CUBLAS and export it.
It’s like there is A.B, but it’s just feels like a independent package(actually a module) written in A, but A doesn’t use B at all, it just export it. And everything that is used in A are just files getting included.

This how people’s think A.B is where the source of the confusing for people coming from python.
For example python people would put utils.jl as a module, and it’s feels very bad to have to include("utils.jl") and then using .Utils, and there maybe several places which use it, so there will be files been included several times.
But every package I saw for Julia, they just have utils.jl as file, not as module, and include(“util.jl”) at the top of the main module.

To me I feel like there is some redundancy to have file/module/package as 3 independent thing, it gives some people freedom but it gives others confusing. And in real world Julia packages, you don’t really see people using module concept efficiently. It’s more of people use files within package, and use package for real dependency problem.

I would suggest to clarify this mindset difference in the noteworthy-differences part of docs, where there is just one single line mentioned The logical Julia program structure (Packages and Modules) is independent of the file strucutre (include for additional files), whereas the Python code structure is defined by directories (Packages) and files (Modules).
People from python don’t understand what it really means for package/file/module as independent things, and how to manage the code in such setting.

18 Likes

Yes, but it’s rather uncommon. I’ve mostly seen it and used it myself in conjunction with the Requires package. Otherwise it should mostly be conditioned on platform or availability of external dependencies.

Or nested includes?

This is somewhat common for large packages. Especially if submodules are used it’s kinda natural to let those handle the includes of their own files.

Or including the same thing in multiple places?

This is rare but I’ve seen it in the wild. I’m not convinced it has ever been a good design though.

I suppose also that the indirection of the package management system means that people won’t be reaching out to include something from inside a package they’re importing

I’ve never seen this and it would be a very questionable thing to do.

3 Likes

Even if various parts form a coherent whole, it might make sense to split code to multiple files for easier code navigation (ie no distractions from other parts when you are working on something).

This is of course a matter of taste, people should do what they like.

8 Likes

For overview and to some extent it can reduce version control conflicts when multiple people are working on the code.

Anecdotally I recently inherited some Python code at work which included a 1600 line class. It didn’t do anything egregious and outsourced a lot of its functionality to other files. It was simply big and had somewhat ambitious docstrings. It was, however, very difficult to get an overview of the code. Eventually I expect I can refactor it to be more manageable but short term I’m rather missing the option to split it into 3 or 4 files (based on sub-functionality) and just include the pieces to stitch them together.

9 Likes

You’re thinking in Blub :wink:

Files have no meaning in the structure of a Julia program, (besides the main PackageName.jl). They are just a way of breaking up code, because 5000 line files are hard to grok, and annoying to use with git.

If your subcalc method is defined or imported anywhere in a module then it is available everywhere in the module. There is no “back door” because all the files included in a module are in the same house already. And there is no “implicit dependency injection” happening.

It does work how you want if you use modules, rather than just files. But mostly we don’t do that because 1. we just reuse the same method names everywhere and have few name space collisions. 2. packages are the level of modularity. To illustrate, I have a package dependency depth of five packages that I maintain in some places. But a maximum module depth of 1 inside any of them.

I suppose also that the indirection of the package management system means that people won’t be reaching out to include something from inside a package they’re importing, and that the package wraps the included code in a module as well, so it’s not just getting dropped “bare” into someone else’s code when they import the package. I think you’re telling me it’s safe to make assumptions I wouldn’t a-priori know that I could make.

Additionally, you can make this assumption a-priori. A file is only dropped bare into your code when you manually call include. import imports a module, not raw code. Files and modules are not related.

10 Likes

I do not use conditional include statements, but I use Requires.jl that may be seem as something similar (but I think it does conditional evals instead of requires).

The use case is simple. I have a command-line tool provided by my package and that uses the functionality the package provides. However, my package/script depends on a solver to work, only one, of about 4 it supports. I do not want to force the user to have a specific solver, or the four of them (some are paid and cannot be installed easily), so Requires.jl allows me to only load the code that interfaces with some solver if the solver is loaded in the environment the script runs. And the only thing the user must do to use the tool is to enter the tool environment a single time and add the solver package they intend to use. Or, if they want to write their own script using my package, they only need to load one of the four solver packages for things to work seamlessly.

I think that looking at Requires.jl use cases will probably reveal which packages depends on something similar to conditional include statements.

1 Like

I use this feature heavily. I have a big package that provides multiple mathematical formulations for a specific problem. So each mathematical formulation has their own submodule. However, for each formulation module I do not have only the code for model creation but also for plotting the results (that change with each formulation), etc… I do not create a second-level of submodules (FormulationName.Build, FormulationName.Results, …) because it is overkill, the Results method need to know the auxiliary structs used by Build and I would need some extra imports between submodules that serve the same formulation and have no conflicts with each other. So what I did to better organize my code was to split these different features in files, and include directly into the submodule by require. I am happy with the results, I find what I want faster now, without any overhead of headers and imports.

11 Likes

I was afraid of that! :sweat_smile:

A lingering question I have: is it ever the intent for the definition of subcalc upstream from calculate to be different in two different places where calculate.jl gets included? For instance, does subcalc do sqrt in one context and x^2 in another, or can I always assume when I read the text of calculate that subcalc will refer to the same function?

For example, like this:
distance.jl

subcalc(x) = sqrt(x)
include("calculate.jl")

norm.jl

subcalc(x) = x^2
include("calculate.jl")

If something like this happens, then the meaning of subcalc in calculate.jl is ambiguous. As far as I can tell, nobody is suggesting that there is a valid use case for something like this. It sounds like the intent is always to have the same context wherever calculate gets included – even if the include happens in multiple places, it is assumed that all the missing identifiers are the same. But there’s no guarantee that the context will be the same. By putting a namespace around calculate, you provide that guarantee. IMO, unless the intent is indeed to change how calculate works in different contexts, then including instead of using leaves the door open to undesired effects.

I played around a bit, and it looks like Julia generates warnings for pathological cases (ex: including both norm.jl and distance.jl into the primary module, or even if I turn those into submodules and try to export calculate from both.) That covers most of what I’m concerned about. If neither submodule Distance or Norm exports calculate, then there is no warning. That still bugs me a little, but I guess I can get behind it because the two versions of calculate are appropriately isolated in their own namespaces, so there’s no reason they should mean the same thing.

1 Like

Not to be too disrespectful because some questions are genuine but it seems to me that an aweful lot of people try to construct examples of misuse of includes and then cry about errors it would cause.
That behavior is equivalent to standing in the kitchen and figuring out that knives can’t just cut bread but also your arm. And then cutting your arm. And then proudly showing your cut arm. And then arguing knives should be prohibited.
How about not cutting your arm?

Just because Julia allows you the freedom to be a bad programmer doesn’t mean you have to be one. There is nothing wrong with includes. If you desire a simple way to import single files in their own namespace, then by all means, demand just that. But don’t equate that to includes being wrong or harmful.

8 Likes

I don’t really see how you can look at that issue and conclude that there is “no proper discussion” or that dissenting voices are “slapped down”. Moreover, how is this kind of comment constructive? It is dismissive of the work and thought that have gone into that thread from all parties — which apparently doesn’t warrant the label “proper discussion”. It is also discouraging to core devs like myself who have been trying to move it in the direction of an actual solution, since we are apparently “slapping down” dissenting voices. In general, this kind of meta-complaint is not only completely unhelpful, it’s also one of the more exhausting things about open source. Please don’t do it.

14 Likes