Implicitly loaded modules in the future?

This looks interesting as a solution to the modules issue. But it would be better if something similar was a standard language feature because if developers start creating special packages to fix the language then we are inventing a special dialect which a regular Julia developer will find hard to follow.

My opinion on the meta-programming features of Julia is that it can certainly be useful to make coding more compact but at the same time it can lead to a dialect that most people will not be able to understand until they read all the associated documentation.

It seems to me that requiring users to care about files is counterproductive. If a user needs some functionality, it should be in a package, and no include is needed. Developer is free to organize things into files, based on the DAG representation of the code (in the developer’s head), but why would a macro be needed for that? Just decide on what should the physical representation into files look like, a then use import. In my mind files and modules are independent concepts. I could have organized my package as a single giant file, but the modules would work just as well as when I keep it one module per file.

2 Likes

Isn’t the solution here to use dev appropriately? The cost is that you have to worry about multiple Project.toml files, but maybe that should be viewed as a good thing because it clarifies the DAG?

shell> tree
.
├── dev
│   ├── Sub1
│   │   ├── Project.toml
│   │   └── src
│   │       └── Sub1.jl
│   └── Sub2
│       ├── Manifest.toml
│       ├── Project.toml
│       └── src
│           └── Sub2.jl
├── Manifest.toml
├── Project.toml
└── src
    └── Test.jl

6 directories, 8 files

It doesn’t solve the problem of “what’s the context this file is evaluated in right now”. But I’m not sure that particular problem bothers me that much because it’s just struct definitions that mess that up and I appreciate the flexibility of splitting code across multiple files.

6 Likes

This approach doesn’t scale as well. An include()/package-based approach is:

  • Heavyweight, requiring the creation of a package at every branch point in the code dependency DAG. That’s a lot of overhead just for some helper file.
  • Requires a developer to look up O(n) code in the size of the codebase n to track the dependencies of any individual piece of code. This is as opposed to just O(log n) in a well-designed file-based approach.
  • Implicit. Within each package, dependency DAGs are held only in developer’s heads, instead of being explicitly written down. Harder to on-board people / easier to make mistakes.
  • open to spooky-action-at-a-distance. It is possible for construct natural examples in which changes in one set of files will affect method resolution in unrelated parts of the code.
8 Likes

A beginner’s perspective here.
The only thing I’m missing in the language is the ability to easily import local modules (a small module I have somewhere).
Now you have to include(“MyModule.jl”) and using .MyModule, which has the drawbacks of include, or generate a package. Even though packages are very lightweight, I still think it’s a hassle to generate a package and then add or dev it in the environment. FromFile seem to offer a nice solution for that.

It would give extra flexibility to be able to make a Module/namespace while working on a scientific project for instance.
Now I kind of understand how to use dev/add and include more or less properly, but it took me a lot of reading, scrolling through discourse, reading and rereading the manual…
It is definitely not straightforward for beginners but neither for more experienced programmers it seems considering the number of posts created on this subjects.
I think beginners would very much appreciate if there was a super easy way to call modules.

7 Likes

That’s not entirely why this happens. It partly is but that’s only half the story. The key fact here is that unlike static languages, there is no different “type context” the right hand side of a type annotation is simply an expression like any other, which is evaluated when the definition is evaluated. In this case the right hand side “looks like” a type, M but it could just as easily be an arbitrarily complex expression like identity(M) (which of course evaluates to M) or something even more complex like cond() ? M : O. This lack of distinction between “type language” and the real language is very powerful and allows people to do very nifty things without needing additional language features (just use the normal language features you already know), but it does mean that when you’re evaluating f(x::M) = x.m you need to already know what M is since otherwise you can’t evaluate it as an expression.

How does this work in static languages? Well, there’s only certain things that can appear on the right hand side of :: — i.e. names of types (not expressions that evaluate to types, but specifically names of types). So you can assume that M is the name of a type even if it hasn’t been defined yet and you can wait until the type gets defined to evaluate the definition.

This isn’t going to change and isn’t related to modules at all, so is a bit of a digression from the primary subject of this thread.

23 Likes

The way forward here is a finalized design and implementation for #4600. We were making some progress towards that for a while with a good productive discussion. The last post which got a lot of likes was by @patrick-kidger. However, I find the design there problematic in a few different ways:

  1. This may be superficial, but with the leading from blah import syntax that’s proposed is far too Python-influenced and doesn’t fit with how imports work in Julia, which is import|using followed by an identifier of what module to import followed by names to import.

  2. It has way too much flexibility and features: the ability to specify a file name and one or more modules and multiple names to import is way over the top. Imports is already an aspect of the language with too much surface area and variations, which we want to reduce, not increase even further. A proposal with this many variations is not going to fly.

I got kind of fatigued by that discussion but if these issues can be addressed and we can make some forward progress then we could get somewhere, which would be good.

23 Likes

First of all, I would like to reiterate that function definitions are declarative, not imperative. You can define functions in any order you like:

julia> f() = g();

julia> g() = a;

julia> const a = 1;

julia> f()
1

What’s not declarative is struct definitions. You can’t refer to a type that hasn’t been defined yet. Suppose for a moment that Julia handled modules like Python does. Suppose I have three types with this DAG: A → C ← B, and suppose my modules look like this:

module Mod1

export A

struct A
    x::Int
end

end
module Mod2

export B

struct B
    x::Float64
end

end
module Mod3

using .Mod1
using .Mod2
export C

struct C
    a::A
    b::B
end

end

Ok, great. Now Julia will automatically figure out the correct order to load code, so I don’t have to use include().

But what if I change Mod3 to this?

module Mod3

using .Mod1
using .Mod2
export C

struct D
    c::C
end

struct C
    a::A
    b::B
end

end

Bam, now I get ERROR: UndefVarError: C not defined. The order of defining structs still matters inside a module! Maybe I should put every single struct definition into its own module, but that seems absurd. Besides, it’s just as easy to mess up the import statements as it is to mess up the order of definition of the structs, so you would just be trading one kind of runtime error for another kind of runtime error.

The bottom line is that struct definition order matters. Having declarative module dependencies with automatic code loading neither changes nor alleviates that.

8 Likes

I’m definitely glad to hear that there’s interest in making this happen if these issues can be addressed.

I think it should be enough to just switch the syntax from
from "file.jl" import obj
to
import "file.jl": myobj.

Which I think would fix both issues.

4 Likes

Take a look at DataFrames.jl. There are tons of files included, some of which define types. If they were executed out of order that would cause problems.

But on the other hand, if each file had to list it’s dependencies on other files, that would be a cure worse than the disease. Every refactor would entail updating every other file. Each file would have to start with dozens of headers. Keeping track of that DAG would be a nightmare.

For me, the idea that Main is just another module, and everything living together inside modules is fine, because it’s a simple rule that is easy to understand, even if some of the behavior resulting from that rule is unintuitive.

6 Likes

I fail to see how throwing files into the mix will reduce the cognitive load!?

1 Like

I understand your arguments! I appreciate your points, but I’m afraid I don’t reach the same conclusion.

Fundamentally, if the dependency DAG is a “nightmare” then that’s probably an indication that things needs organising better – regardless of your choice of import/include framework. Having an implicit nightmare DAG sounds like a worst-of-both-worlds when it comes to writing bug-free/readable/multi-developer code.

7 Likes

It’s literally the exact opposite. The reason why include is heavyweight is because you have to write down the dependency chain. The other approach is to have spooky-action-at-a-distance resolve it for you, hopefully in the right order, where by virtue of file names and magic things just exist that you would otherwise have to do by hand. That’s the natural engineering trade-off between being explicit about what code is included and being implicit.

The include approach is very explicit about exactly what code is executed and in which order, while the approaches are implicit and “work it out”. If two of the submodules define the same global variable, what value does it end up with? There’s no explicit ordering: you’re at the mercy of FromFile.jl magic to define the ordering for you.

That’s not to knock the method at all: if some people feel that FromFile.jl helps you design things in a way you like, then go for it. But implicit non-deterministic action is the downside to not requiring that someone says “line 1 goes before line 2”, the exact requirement which is being argued that some people don’t want from include. You have to choose between the two, but let’s not stretch the advantages of either too far.

8 Likes

I think a main source of disagreement is file size. Is it my understanding you are used to working with very long files in python?

So the “in which order” is really the bit where things go wrong here. I find it much (much) better to express code as DAG rather than linearly. DAGs are O(log n) in depth and avoid manual topological sorting, for example.

Moreover anything implicit is just an easy source of bugs, really.

Not really. A few hundred lines would be typical; anything past about 1000 is where I really start to try and split things up. (Moreover I try to avoid the direct comparison to Python. I’m aware that “developer from other language complains about Julia” is a stereotype around here, and I understand that such comparisons are not always profitable.)

7 Likes

That’s nice, but that does mean the final ordering in which the code is actually loaded is implicit. So I agree “moreover anything implicit is just an easy source of bugs, really”, which is the worrisome issue for having it all worked out implicitly by a resolver on a DAG. include forces you to choose an order. That orders makes all things well-defined: that order makes you able to know what any values of overridden globals would be.

The other method can mean if you do Main.a = i in modules A_i, the internals of the DAG can choose to order forward, reverse, or not sort at all, and those can all change the value of a == 2 in the user’s REPL. It can change when you change the version of Julia, because that could potentially change the heuristics of the resolver. It can change when you change the file name. That by definition is spooky action at a distance.

That’s not to say it’s any worse, there’s definitely some benefits to having all of that implicit. But it’s implicit.

3 Likes

Nobody likes the idea of having one kind of module or macro that parses the module as a Pluto notebook? (including the included code, of course). Nothing implicit at all there. It could be ( :joy:):

module MyModule
  implicit none # <-- !! 
  include("file1.jl")
  f(x) = 1
  ...
end
1 Like

That’s true, it’s a reasonable point that code loading happens in an implicit order.
Personally I find that a more-than-worthy trade-off for the issues I mentioned earlier. Doing anything akin to Main.a = i is pretty suspicious behaviour already. (A bit like type piracy really – don’t touch what you don’t own.)

1 Like

Do you mean worst case or average case? Because clearly the worst case is a linear DAG. If you have a linear DAG with n nodes, then the depth is n. And in order to talk about the average case you would need to define the random distribution from which DAGs are sampled… not an easy thing to do. Of course you could do an empirical measurement. If you measure the DAG depth among Python repos on Github, maybe it turns out to be O(log n). Anyways, I’m not sure what O(n) vs O(log n) depth has to do with readability and maintainability of a code base, though I suppose it does help to clarify and possibly reduce the amount of code a new contributor needs to read before adding code to a particular module.

3 Likes

I’m a little confused about how you can actually do this. I’ve tried a few things with no success, including the following:

julia> module A
           Main.x = 1
       end
ERROR: cannot assign variables in other modules

julia> module MyMain
           module A
               MyMain.x = 1
           end
       end
ERROR: UndefVarError: MyMain not defined