Implicitly loaded modules in the future?

Am I wrong, that conversion using "Foo.jl"include("Foo.jl"); using .Foo can be solved by external package? Well, it’ll be @using instead of using and user should add such a package as a dependency, but it looks like a rather small price to pay. Is it really necessary to add such a thing to Base?

Alternatively, it would be good to have something like Experimental.using which doesn’t conform to SemVer rules and can be removed between versions.

I mean, it would be really good to experiment with such a feature and only after it prove itself really useful make it part of the language.

4 Likes

Perhaps my lack of understanding comes from not seeing this as a problem, but a good practice (that, perhaps, evolved somewhat accidentally). Packaging up some sub-functionality with a cleanly defined API as an advantage for the ecosystem; with generic programming a powerful feature of Julia people may end up using it for something else. Also, it makes stuff like CI more modular. Packages are super-cheap in Julia, remaining issues (such as registering a chain of dependencies) could be solved by tooling.

I understand that some people want this, and also that it is a natural feature for some languages, but I am not convinced that it is that natural for Julia.

A lot of organizations now maintain a network of 20+ interconnected packages (SciML, JuliaImages, plenty others). It would be great if their maintainers and principal architects could share their thoughts on this discussion: do they miss the ability to automatically orchestrate loading order for large packages? Would they prefer to create large monolithic repos if they had this?

I would propose that it is also a selection pressure to some extent in a FOSS context: namely, projects with complex internal structure may be more difficult to contribute to.

Finally, before changing the language per se, it would make sense to at least make a list of problems that could be tackled by tooling. The following come to mind:

  1. support for registering a bunch of interdependent packages at once
  2. metadata for marking packages that are meant for semi-internal use, for use in JuliaHub and similar, so that they come last or are omitted in search for user-facing functionality
  3. a package that implements a wrapper for including a script within a module (that is a use case that came up in the related issue)

It could be that these are inadequate solutions and the language should change after all, but then at least we have a bunch of well-documented use cases. Also, if someone happens to tackle one of the tooling issues, its usage could be informative on how common these issues are.

4 Likes

A monolithic repo would make absolutely no sense for SciML. We wouldn’t be able to afford it. Pooling all of the tests together probably leads to a test set which is upwards of 60 hours. Also, students are already scared of the bigger more active repos (OrdinaryDiffEq.jl, DiffEqSensitivity.jl, etc.) and tend to like to congregate at the peripheries (NeuralPDE.jl, DiffEqFlux.jl, etc.) where there’s smaller code and “more internal documentation”, i.e. what’s used in the package are other libraries like DifferentialEquations.jl which are full documented packages. Pulling it into a monorepo is essentially taking away the newcomer havens and requiring that someone adopts the practices of a 20+ person active repo in order to be a contributor: why?

So there’s hardly any benefit to such a monorepo but there are pretty massive downsides. You’re not changing complexity, you’re just doing a trade-off on who gets the complexity. The benefit is mostly to the maintainer because they now have less work keeping the boat afloat: every PR gets the massive load of tests and you know could have broke. But you’ve now shifted the complexity to the newcomer who has to know the downstream effects from submodule A to submodule D through submodules B and C in order to get tests passing. Today it’s “let’s merge and make this a breaking release, I’ll handle downstream. Thanks! :grinning_face_with_smiling_eyes:” and tomorrow it’s “tests are failing, cannot merge. Let’s work on this for 6 months”? I don’t think it’s very surprising that GSoCs to the Julialang/julia repo have by far and away the highest failure rate.

If you want to know how SciML can get the mindshare of so many young developers in a somewhat niche subject requiring, keep them for years at a time, and grow them into maintainers at a rate exponentially higher than monorepos, that right there is our strategy and we don’t want to lose that advantage. We don’t have the funds like Google/Facebook to pay enough people to want to deal with a Jax or PyTorch complexity project. And we don’t want to exclude young undergrads and high schoolers from being valuable and useful maintainers even if it’s just to MuladdMacro.jl or RuntimeGeneratedFunctions.jl.

[But closing note, that doesn’t mean submodules never have a place. Notably, ModelingToolkit.jl has the StructuralTransformations submodule. Nice syntax on that could be… nice. But a syntactic change wouldn’t cause an organizational change because there are many “beyond code” reasons for the organizational structure]

13 Likes

I would note that no-one seems to be advocating monorepos. (Least of all any proponents of this change, as far as I can see.) I don’t see that this is relevant to the discussion.

This current discussion about “complex internal structure” seems like a funny one to me. We all agree that we don’t want complex internal structure.

Some folks seem to prefer the current system simply because it makes it very difficult to maintain any internal structure, and see this as a pressure towards writing better code.

Others, such as myself, prefer these proposed additional features so as to make any internal structure more apparent… and see this as a pressure towards writing better code.

5 Likes

I was literally asked, and quoted the question.

That’s the biggest strawman I’ve seen. “People who advocate for it like it because it’s bad”. Consider the arguments first.

  1. namespacing and code loading are different features
  2. non-determinism is harder to maintain than deterministic code
  3. less reliance on the exact compiler implementations for correct code loading and order
  4. lack of a linear register, i.e. it’s impossible to know the actual structure of the code because it’s not written but generated.

That doesn’t mean that such a mixed code loading + namespacing couldn’t have an interesting syntax. But ignoring the actual engineering issues, purposefully being inflammatory (i.e. saying it’s censorship to say that Julia has modules? …), and ignoring the comments of others as something advocated by “no-one” is not an inviting strategy nor is it technically sound. I think the inflammatory nature of many of these comments is what has sent this into a standstill more than anything technical, and doing community building around the ideas would be much more helpful.

8 Likes

Sorry for interfere, but this thread is so long and there was so many ideas, that is really hard to understand what features are proposed.

Can you give an example, how code is going to look like with proposed features? Small MWE, just to understand what we are talking about and how it differs from the current system.

I mean, for example, we have three files

#main.jl

module Main

include("a.jl")
include("b.jl")

export foo, bar
end
# a.jl

function foo end
# b.jl

function bar end

So, with this template, what changes should I made to use proposed features and why it is better?

1 Like

A post was split to a new topic: Suggestion: use PEP-like process for changes

I don’t read the previous question as having anything to do with monorepos.

This discussion has been about explicit declaration of dependencies, which is an entirely different topic.

Nothing I say is intended to be inflammatory. I stand by my statement as being factually accurate – evidence:

so let’s not inject needless emotion into it. (I suppose we agree about that at least!)


Have a look at Implicitly loaded modules in the future? - #139 by patrick-kidger from earlier in the thread. :slight_smile:

(Which in turn refers to the disadvantages of the current approach – see Implicitly loaded modules in the future? - #44 by patrick-kidger .)

4 Likes

Folks, I haven’t been able to keep up with this conversation but I can assure people that no convincing is needed that there is a feature to be added here. The question is just what that feature should be. It is, of course, perfectly possible to use include the way you can today but that doesn’t mean that something more automatic and requiring less boilerplate cannot be done — and it should! I opened #4600 back in 2013 (man time flies) and still think we should do something. The reason it hasn’t happened yet is because it’s far better to live without a feature than to add the wrong feature. As it happens, there are fundamental problems with the “original plan” we cooked up in 2017, which were only recently pointed out by @patrick-kidger, so I’m quite glad we dragged our feet on implementing it, because if we had we might have been stuck with a broken design.

Bottom line: please stay cool and be excellent to each other.

eeJ9kmk

43 Likes

I think we’re at the point where concrete code demos are needed for clarity.


It seems like there are two versions being proposed.

@patrick-kidger proposes a version that doesn’t use submodules:

# master.jl
import "A.jl": foo
import "B.jl": bar

# utils.jl
common = 1

# A.jl
import "utils.jl": common
foo = common + 2

# B.jl
import "utils.jl": common
bar = common + 3

And there is another suggestion for a version using submodules:

@stevengj in this version, is it expected to look like this, with the Utils submodule declared inside utils.jl

# master.jl
import "A.jl": foo
import "B.jl": bar

# utils.jl
module Utils
common = 1
end

# A.jl
import "utils.jl": common
foo = common + 2

# B.jl
import "utils.jl": common
bar = common + 3

, and import "utils.jl": common expands to

include("utils.jl")
import Utils: common

? Or something else?

@ChrisRackauckas writes

I think I understand what you mean by (1) and (3).

On (2), I don’t get what non-determinism refers to in these examples.

On (4), I agree that when it’s hard “to know the actual structure of the code because it’s not written but generated” that’s a downside, but I would have described the current situation that way, not the proposed one. Currently when reading B.jl, I don’t know what common refers to without looking at the context in which B.jl is included. In the proposals, it is clear what common refers to. So I don’t get how the proposal makes the structure less clear in your perspective. I would like to understand that.

3 Likes

It’s a little more complicated than this because you would want to be able to have multiple import "Utils.jl" statements (possibly from different submodules) and have it end up including the file once (i.e., evaluating the module only once, so that symbols are === from different imports), like Base.require.

Note also that you would probably want to enforce the filename-module correspondence. i.e. import "Utils.jl" would require the file Utils.jl to define a module named Utils. So import "utils.jl" would be an error if the module name was Utils. (If you don’t want this correspondence you could still use manual include statements.)

3 Likes

There is the “sub package” idea:

# mypackage.jl
using MyPackage.A : foo
using MyPackage.B

# A.jl
subpackage module A
  foo(x) = 1
end

# B.jl
subpackage module B
  bar(x) = 2
end

where MyPackage.A and MyPackage.B are:

  1. Independent packages that get their own UIDs automatically.
  2. Are searched for in the subdirectories of MyPackage (no need to register while developing).
  3. Can be used and installed as independent packages with ] add MyPackage.A or something similar.
  4. Share the same version as MyPackage (?)

(maybe within MyPackage one could use simply using .A and using .B).

Thank you for putting it together!

In first version I do not understand, how multiple dispatch should be treated.

# utils.jl
function foo end

# A.jl
import "utils.jl": foo
function foo(x::Int) end

# B.jl
import "utils.jl": foo
function foo(x::String) end

What should I put in master.jl?

# master.jl
import "A.jl": foo
import "B.jl": foo

looks redundant (if I have hundred different definitions of foo should I import all of them??)

and

# B.jl
import "utils.jl": foo

kind of defeat the purpose of this idea, because we’ve lost the information that actual definitions lives in A.jl and B.jl. So discoverability is lost (and it was one of the main points of this proposal if I get it right).

That code was from @patrick-kidger, so he’s better positioned to answer.

2 Likes

I don’t think having to write out import "Xi.jl": foo for i=1,…,n is any worse than include("Xi.jl") for i=1,…,n. If you have a lot of code objects then you need to write a lot of code, that’s always going to be true.

I agree that multiple dispatch offers a shortcut past discoverability. That’s equally true of any other operation that mutates an existing module, though, of course. MD is quite a common use case though, so perhaps there’s a way to refine things to help this? Ideally you’d import the files that define the methods you’re actually using.

So overall what you’ve written down looks correct to me – and is also how FromFile currently works.

1 Like

I do not know, I do not feel myself happy to write

import "A.jl": foo1, foo2, foo3, foo4, ..., foo10, .... foo100
import "B.jl": foo1, foo2, foo3, foo4, ..., foo10, .... foo100

as opposite to

include("A.jl")
include("B.jl")

And I guess there is a wildcard option

import "A.jl": *
import "B.jl": *

but again it’s usage defeats the purpose.

This is only my opinion and personal feelings, probably this approach is really useful in some cases. But it looks scary and not something I would like to use in a day to day work. Too much of java flavour.

There are two questions a reader can ask:

  • Which files are providing names into this file?
  • Which names are those files providing?

When I’m reading a file, I like to know the answers to both questions using information available in the same file.

  • Currently I have to look in another file to answer the first question and I have no way to answer the second.
  • Under import "A.jl" : * I can answer the first question without switching files but answering the second requires looking in A.jl.
  • Under import "A.jl": foo, I can answer both questions without switching files.
3 Likes

Sorry for derailing this conversation again, but after some thinking, it seems that if one leave discoverability aside (for example you are fine with grep and friends) dependency DAG resolution is something very different and can be solved with rather lightweight changes.

One can introduce two additional commands depends and use. First one declares all parents files which should be loaded before this file is included and second command should include file at some point, but only after DAG is resolved.

Example of usage can be like the following

# master.jl

use "A.jl"
use "utils.jl"
use "B.jl"

export foo

# utils.jl
function foo end

# A.jl
depends "utils.jl"

function foo(x::Int) end

# B.jl
depends "utils.jl"

function foo(x::String) end

Now, main idea is that use is almost the same as include but order doesn’t matter. During package compilation, compiler go through all use files, extract depends information, build DAG (at this point it can validate graph and issue errors if circular dependencies are detected) and as a result generate (internally) proper include list.

This approach removes most of the mental efforts from the user and price is not very high, one only need to define immediate parents, which looks reasonable. And at the same time it still can utilize current Julia functions (it’s basically light weight preprocessor). Is it a reasonable approach?

2 Likes

This is a reasonable approach – something similar has come up before, as the essential design behind PatModules. In practice that didn’t stick around long before we replaced it with FromFile, i.e. something like the current topic of discussion, which IMO is a bit simpler. Only one keyword, for starters.

2 Likes

Yes, I see.

Dependency DAG has root and this root should be in a bottom and it’s master.jl in this case. Than there is no difference between depends and use, because one can say, that master.jl depends on all files in the DAG (since no one depends on master.jl it shouldn’t introduce any cycles). And if we change word depends to import than it seems we really can come to the idea that you proposed.

So, if one can do the following:

  1. Determine root automatically
  2. Just import files without any extra efforts (import "a.jl": foo can be additional nice feature for those who wants more fine-grained control)

then as a result he can get very nice tool which auto resolve dependency DAG and this is awesome. Actually, since only immediate parents are required, resulting structure can look like

# master.jl

import "B.jl"
import "A.jl"

export foo

# utils.jl
function foo end

# A.jl
import "utils.jl"

function foo(x::Int) end

# B.jl
import "utils.jl"

function foo(x::String) end

I.e. no import of utils.jl in master.jl and arbitrary order of import in every file.

Yes, if FromFile.jl can do it, then it is awesome undervalued package and it should be added to Base in one form or another.