Implicitly loaded modules in the future?

This can show up in many ways though. If you have two modules which overload the same Base function, but with different methods on the same dispatch, the final method that you get is now dependent on the DAG. If you want to hash all of the imported code to say, track code changes to remove precompilation, you can have invalidation occur because the DAG’s heuristics change. One major global that is hit in all languages is I/O: if two modules write to a file at load time, the ordering of the contents in the file are no longer determined. And I can keep going but you probably get the point that there’s a ton of practical issues that this does cause.

That of course doesn’t make being explicit better, it’s just the trade-off that’s being made. One way to avoid these issues is to claim that anything that would be non-deterministic given this behavior is simply bad coding style. Sure, but you can see what happened to the Python ecosystem with this: many packages simply do not work with alternative Python compilers because they are dependent on very internal details about the CPython implementation. This has caused many issues, one notable one being that this pretty much halted the adoption of alternative interpreter implementations like pypy.

With Julia, one of the core concepts being explored is to have a very small compiler surface so that alternative interpreters (JuliaInterpreter.jl, the new Abstract Interpreter, Mjolnir, etc.) can all take Julia code, easily “know what to do”, and compose. The more details that tie code execution order and output to specific heuristics of the compiler, the harder it is to support such an ecosystem in a way where using these alternatives on a package can be expected to “just work”. Of course, there are always compiler heuristics that can be helpful, and it’s more of a “degree of dependence” (one important one is how reliant these compiler tools can be on type-inference heuristics).

14 Likes

Oh yeah, that one was patched out of the language to be illegal. But on refs it still works:

x = Ref(2)
module A
   Main.x[] = 1
end
using .A
@show x

This is actually a nice way to debug FWIW, because you can grab pieces of a big function to play with later. Then of course you can do this with arrays, files, etc.

4 Likes

Additionally, I don’t think “easy code to read” is always the best goal.

If we require import fromfile statements everywhere, that make code easy to read, but makes it very hard for developers to write code. They have to keep track of which files define which names. I would bet people spend far more time writing code than reading it, so we should prioritize developer ease of use over a “reader”.

Furthermore, I’m not sure how much being explicit about the names of imported objects gets you. If one file defines foo(x::X) and another file defines foo(y::Y), what does knowing import foo from file.jl tell us exactly?

3 Likes

Usually the opposite is claimed on the internet, along the lines of “code is read more often than it is written”. I suppose it depends on the organization. If a person is writing research code at a university, maybe their code will never be read. (Especially if the code is not on Github, which is still common in academic research… :joy:)

17 Likes

I would argue this is worse with the liberal use of include right now. Not only does one have to keep track of which files define which names “down” the DAG, but also back “up” as well in files that include a file.

My understanding is that it’s the exact opposite, namely code is read far more than it is written.

1 Like

Closer, but I’d change it once more. Code is maintained for far longer than its initial development. What makes it easiest for someone to update package X when dependency Y eventually does a breaking change? It doesn’t matter if you’re maintaining a major OSS organization or just your own simulation code, this is an issue in every language and where people spend either far too much time, or not enough time (academia :eyes:). This points to multiple principles, like the classic valuing of legibility more than coding speed, but also others, like preferring behaviors that don’t silently change.

[One example of this last one is reliance of Base random number generators in package tests: if you need specific random numbers for a test to work, you are giving maintenance troubles to somebody in the future!]

12 Likes

My assumption is that the DAG for a well-constructed project looks something like a tree. (Or some not-too-complicated quotient of one.) Which should produce a log n depth.

Mildly hand-wavy but not too far off my experience for what happens in practice. As you say, the key point is to reduce the amount of code a (new or returning) contributor needs to read.

I agree, these are all reasonable points.

2 Likes

Can you demonstrate with a non-trivial example that what you propose would work? For instance using FromFile?

1 Like

This is what is constantly repeated on the internet, but my experience is the opposite, by at least an order of magnitude. So, I agree with @pdeffebach, and it’s refreshing to actually see someone state this for once😁

I guess it depends on the kind of code, though.

1 Like

I’ll admit I can’t think of a single scenario where code is never read back more than once and the code in question isn’t purely ephemeral. e.g. exploratory analytics. In that particular case, permanence and modularity don’t matter anyways and this whole topic should not apply :man_shrugging:. On the other hand, most code with any kind of longevity is read by at least two pairs of eyes, because people don’t stay in the same academic/industry position forever and handoffs need to happen. This increases superlinearly with the number of collaborators, e.g. open source libraries have an order of magnitude more readers than writers.

4 Likes

Sure. I’ll use an example from @MilesCranmer’s excellent SymbolicRegression.jl package.

There’s a reasonable number of files in this package, and I can’t really claim close familiarity with its internal workings. But opening up a random file:

I can immediately see where to find anything I need to understand this file, explicitly imported at the top of the file. It’s possible for me to understand this file without having to understand the other 20-odd files elsewhere in the package. Which is great when one inevitably needs to read the source code of one’s dependencies, to figure out some edge case etc.


On the topic of code being read more often than it is written: this is definitely true in my experience. FWIW most of my work is either on open source libraries or group-internal codebases, with several collaborators.

13 Likes

Hmm. This could have just as well been written

using Random: shuffle!
using Core: Options, Dataset, RecordType
using EquationUtils: stringTree
using PopMember: PopMember
using Population: Population, bestOfSample
using Mutate: nextGeneration
using Recorder: @recorder

if all the files corresponded to modules. No loss of legibility…

3 Likes

This still requires a master file somewhere to include them all.

Which can work! If you follow this to its logical conclusion you end up with the original PatModules.jl design, with “main files” (which do this grouping together) and “auxiliary files” (with the actual content). This takes a bit of extra effort – the master file for the includes, the wrapping files into modules, etc. – and limits you to a single entry point into the structure (the master file).

But of course, why settle for half-measures? We introduced FromFile.jl to fix these issues and streamline the experience. (EDIT: Although, to be clear, I think your proposal/PatModules.jl is still better than not doing anything like this at all.)

5 Likes

The master file needs to be there anyway! The package structure requires it…

But I still don’t see what is being gained here with the proposed approach.

With FromFile.jl there is no need for a master file which includes everything.

BTW, I got curious so I typed in “Python module issues” and a good list came up:

http://python-notes.curiousefficiency.org/en/latest/python_concepts/import_traps.html

They pretty much outlined what I just mentioned as the “not beginner friendly” parts of the system. It’s precisely the fact that it has a lot of spooky action at a distance. Here’s a fun one:

The double import trap
As an example, Django (up to and including version 1.3) used to be guilty of setting up exactly this situation for site-specific applications - the application ends up being accessible as both app and site.app in the module namespace, and these are actually two different copies of the module. This is a recipe for confusion if there is any meaningful mutable module level state, so this behaviour was eliminated from the default project layout in version 1.4 (site-specific apps now always need to be fully qualified with the site name, as described in the release notes).

I actually remember that now that it’s mentioned, and I never understood what was going on! Now I know that it was a module import issue that was unfixed for half a decade! If one of the most widely used package in the ecosystem can fall for this trap, then it’s a pretty rough one.

But anyways, I don’t want that to start a language war. The point is really that every choice has trade-offs. Julia’s current system is the most barebones explicit one you could choose: no implicit modules, no code that automatically shows up defined, no modules automatically named to match file names, etc. The downside is that you have to import and you only get what you import. You can keep making parts of that more and more automatic, making more assumptions and having new free code show up, and then it depends more and more on a complex set of rules which hopefully tends to work most of the time.

What’s the right spot in this continuum to sit? Personally I think it’s good to have the explicit include, and then a little bit of syntactic sugar on top. Maybe one or two short-hands for common things, like import as and FromFile.jl. Beyond some list :man_shrugging: send it to a package. You already see everyone overuse using because the world is too complex to think about using vs import before writing your first package, and adding 14 new options wouldn’t make that any easier. It’s very easy to then go overboard and add every feature you can think of to the language (hi C++), but that then increases the surface of compiler support which makes maintaining alternative implementations harder (JuliaInterpreter for the debugger, IRTools, etc.) and makes training beginners harder.

24 Likes

The main downside for me is that when I’m reading a file I don’t know what the names refer to.

So I actually agree with all of this (and could add a few to the list myself). Python’s import system is one of the worst parts of the language, IMO.

FromFile definitely isn’t meant to emulate Python, despite the surface similarity.
(I’m not trying to make this a thread about FromFile, incidentally…)

I like this idea a lot if include can be made to be strictly less powerful. That is, every file needs to explicitly declare any outside, non Base imports it needs instead of assuming they exist in the ambient context. How that would look I’m not sure, but it would reduce the maximum “blast area” of includeing from essentially infinite to something more manageable.

1 Like

That’s not really true. There’s still the top-level file that defines the module for the package. That file has to use @from to import all the symbols that you want to export from your package. So it ends up looking about the same as just including a bunch of files: