Implicitly loaded modules in the future?

ChrisRackauckas · June 10, 2021, 5:01pm

This can show up in many ways though. If you have two modules which overload the same Base function, but with different methods on the same dispatch, the final method that you get is now dependent on the DAG. If you want to hash all of the imported code to say, track code changes to remove precompilation, you can have invalidation occur because the DAG’s heuristics change. One major global that is hit in all languages is I/O: if two modules write to a file at load time, the ordering of the contents in the file are no longer determined. And I can keep going but you probably get the point that there’s a ton of practical issues that this does cause.

That of course doesn’t make being explicit better, it’s just the trade-off that’s being made. One way to avoid these issues is to claim that anything that would be non-deterministic given this behavior is simply bad coding style. Sure, but you can see what happened to the Python ecosystem with this: many packages simply do not work with alternative Python compilers because they are dependent on very internal details about the CPython implementation. This has caused many issues, one notable one being that this pretty much halted the adoption of alternative interpreter implementations like pypy.

With Julia, one of the core concepts being explored is to have a very small compiler surface so that alternative interpreters (JuliaInterpreter.jl, the new Abstract Interpreter, Mjolnir, etc.) can all take Julia code, easily “know what to do”, and compose. The more details that tie code execution order and output to specific heuristics of the compiler, the harder it is to support such an ecosystem in a way where using these alternatives on a package can be expected to “just work”. Of course, there are always compiler heuristics that can be helpful, and it’s more of a “degree of dependence” (one important one is how reliant these compiler tools can be on type-inference heuristics).

ChrisRackauckas · June 10, 2021, 5:03pm

Oh yeah, that one was patched out of the language to be illegal. But on refs it still works:

x = Ref(2)
module A
   Main.x[] = 1
end
using .A
@show x

This is actually a nice way to debug FWIW, because you can grab pieces of a big function to play with later. Then of course you can do this with arrays, files, etc.

pdeffebach · June 10, 2021, 5:07pm

Additionally, I don’t think “easy code to read” is always the best goal.

If we require import fromfile statements everywhere, that make code easy to read, but makes it very hard for developers to write code. They have to keep track of which files define which names. I would bet people spend far more time writing code than reading it, so we should prioritize developer ease of use over a “reader”.

Furthermore, I’m not sure how much being explicit about the names of imported objects gets you. If one file defines foo(x::X) and another file defines foo(y::Y), what does knowing import foo from file.jl tell us exactly?

CameronBieganek · June 10, 2021, 5:13pm

Usually the opposite is claimed on the internet, along the lines of “code is read more often than it is written”. I suppose it depends on the organization. If a person is writing research code at a university, maybe their code will never be read. (Especially if the code is not on Github, which is still common in academic research… )

ToucheSir · June 10, 2021, 5:16pm

I would argue this is worse with the liberal use of include right now. Not only does one have to keep track of which files define which names “down” the DAG, but also back “up” as well in files that include a file.

My understanding is that it’s the exact opposite, namely code is read far more than it is written.

ChrisRackauckas · June 10, 2021, 5:36pm

Closer, but I’d change it once more. Code is maintained for far longer than its initial development. What makes it easiest for someone to update package X when dependency Y eventually does a breaking change? It doesn’t matter if you’re maintaining a major OSS organization or just your own simulation code, this is an issue in every language and where people spend either far too much time, or not enough time (academia ). This points to multiple principles, like the classic valuing of legibility more than coding speed, but also others, like preferring behaviors that don’t silently change.

[One example of this last one is reliance of Base random number generators in package tests: if you need specific random numbers for a test to work, you are giving maintenance troubles to somebody in the future!]

patrick-kidger · June 10, 2021, 6:02pm

My assumption is that the DAG for a well-constructed project looks something like a tree. (Or some not-too-complicated quotient of one.) Which should produce a log n depth.

Mildly hand-wavy but not too far off my experience for what happens in practice. As you say, the key point is to reduce the amount of code a (new or returning) contributor needs to read.

I agree, these are all reasonable points.

PetrKryslUCSD · June 10, 2021, 6:11pm

Can you demonstrate with a non-trivial example that what you propose would work? For instance using FromFile?

DNF · June 10, 2021, 6:54pm

This is what is constantly repeated on the internet, but my experience is the opposite, by at least an order of magnitude. So, I agree with @pdeffebach, and it’s refreshing to actually see someone state this for once😁

I guess it depends on the kind of code, though.

ToucheSir · June 10, 2021, 7:20pm

I’ll admit I can’t think of a single scenario where code is never read back more than once and the code in question isn’t purely ephemeral. e.g. exploratory analytics. In that particular case, permanence and modularity don’t matter anyways and this whole topic should not apply . On the other hand, most code with any kind of longevity is read by at least two pairs of eyes, because people don’t stay in the same academic/industry position forever and handoffs need to happen. This increases superlinearly with the number of collaborators, e.g. open source libraries have an order of magnitude more readers than writers.

patrick-kidger · June 10, 2021, 7:24pm

Sure. I’ll use an example from @MilesCranmer’s excellent SymbolicRegression.jl package.

There’s a reasonable number of files in this package, and I can’t really claim close familiarity with its internal workings. But opening up a random file:

github.com

MilesCranmer/SymbolicRegression.jl/blob/d7ac80267e1e4a4cbdac66bf76f3c7f8cb5d81c6/src/RegularizedEvolution.jl

using FromFile
using Random: shuffle!
@from "Core.jl" import Options, Dataset, RecordType
@from "EquationUtils.jl" import stringTree
@from "PopMember.jl" import PopMember
@from "Population.jl" import Population, bestOfSample
@from "Mutate.jl" import nextGeneration
@from "Recorder.jl" import @recorder

# Pass through the population several times, replacing the oldest
# with the fittest of a small subsample
function regEvolCycle(dataset::Dataset{T},
                      baseline::T, pop::Population, temperature::T, curmaxsize::Int,
                      frequencyComplexity::AbstractVector{T},
                      options::Options,
                      record::RecordType)::Population where {T<:Real}
    # Batch over each subsample. Can give 15% improvement in speed; probably moreso for large pops.
    # but is ultimately a different algorithm than regularized evolution, and might not be
    # as good.
    if options.fast_cycle

This file has been truncated. show original

I can immediately see where to find anything I need to understand this file, explicitly imported at the top of the file. It’s possible for me to understand this file without having to understand the other 20-odd files elsewhere in the package. Which is great when one inevitably needs to read the source code of one’s dependencies, to figure out some edge case etc.

On the topic of code being read more often than it is written: this is definitely true in my experience. FWIW most of my work is either on open source libraries or group-internal codebases, with several collaborators.

PetrKryslUCSD · June 10, 2021, 7:58pm

patrick-kidger:

using Random: shuffle!
@from "Core.jl" import Options, Dataset, RecordType
@from "EquationUtils.jl" import stringTree
@from "PopMember.jl" import PopMember
@from "Population.jl" import Population, bestOfSample
@from "Mutate.jl" import nextGeneration
@from "Recorder.jl" import @recorder

Hmm. This could have just as well been written

using Random: shuffle!
using Core: Options, Dataset, RecordType
using EquationUtils: stringTree
using PopMember: PopMember
using Population: Population, bestOfSample
using Mutate: nextGeneration
using Recorder: @recorder

if all the files corresponded to modules. No loss of legibility…

patrick-kidger · June 10, 2021, 8:10pm

This still requires a master file somewhere to include them all.

Which can work! If you follow this to its logical conclusion you end up with the original PatModules.jl design, with “main files” (which do this grouping together) and “auxiliary files” (with the actual content). This takes a bit of extra effort – the master file for the includes, the wrapping files into modules, etc. – and limits you to a single entry point into the structure (the master file).

But of course, why settle for half-measures? We introduced FromFile.jl to fix these issues and streamline the experience. (EDIT: Although, to be clear, I think your proposal/PatModules.jl is still better than not doing anything like this at all.)

PetrKryslUCSD · June 10, 2021, 8:17pm

The master file needs to be there anyway! The package structure requires it…

But I still don’t see what is being gained here with the proposed approach.

panos.asproulis · June 10, 2021, 8:33pm

With FromFile.jl there is no need for a master file which includes everything.

ChrisRackauckas · June 10, 2021, 8:40pm

BTW, I got curious so I typed in “Python module issues” and a good list came up:

http://python-notes.curiousefficiency.org/en/latest/python_concepts/import_traps.html

They pretty much outlined what I just mentioned as the “not beginner friendly” parts of the system. It’s precisely the fact that it has a lot of spooky action at a distance. Here’s a fun one:

The double import trap
As an example, Django (up to and including version 1.3) used to be guilty of setting up exactly this situation for site-specific applications - the application ends up being accessible as both app and site.app in the module namespace, and these are actually two different copies of the module. This is a recipe for confusion if there is any meaningful mutable module level state, so this behaviour was eliminated from the default project layout in version 1.4 (site-specific apps now always need to be fully qualified with the site name, as described in the release notes).

I actually remember that now that it’s mentioned, and I never understood what was going on! Now I know that it was a module import issue that was unfixed for half a decade! If one of the most widely used package in the ecosystem can fall for this trap, then it’s a pretty rough one.

But anyways, I don’t want that to start a language war. The point is really that every choice has trade-offs. Julia’s current system is the most barebones explicit one you could choose: no implicit modules, no code that automatically shows up defined, no modules automatically named to match file names, etc. The downside is that you have to import and you only get what you import. You can keep making parts of that more and more automatic, making more assumptions and having new free code show up, and then it depends more and more on a complex set of rules which hopefully tends to work most of the time.

What’s the right spot in this continuum to sit? Personally I think it’s good to have the explicit include, and then a little bit of syntactic sugar on top. Maybe one or two short-hands for common things, like import as and FromFile.jl. Beyond some list send it to a package. You already see everyone overuse using because the world is too complex to think about using vs import before writing your first package, and adding 14 new options wouldn’t make that any easier. It’s very easy to then go overboard and add every feature you can think of to the language (hi C++), but that then increases the surface of compiler support which makes maintaining alternative implementations harder (JuliaInterpreter for the debugger, IRTools, etc.) and makes training beginners harder.

jzr · June 10, 2021, 8:45pm

The main downside for me is that when I’m reading a file I don’t know what the names refer to.

patrick-kidger · June 10, 2021, 8:48pm

So I actually agree with all of this (and could add a few to the list myself). Python’s import system is one of the worst parts of the language, IMO.

FromFile definitely isn’t meant to emulate Python, despite the surface similarity.
(I’m not trying to make this a thread about FromFile, incidentally…)

ToucheSir · June 10, 2021, 8:50pm

I like this idea a lot if include can be made to be strictly less powerful. That is, every file needs to explicitly declare any outside, non Base imports it needs instead of assuming they exist in the ambient context. How that would look I’m not sure, but it would reduce the maximum “blast area” of includeing from essentially infinite to something more manageable.

CameronBieganek · June 10, 2021, 8:51pm

That’s not really true. There’s still the top-level file that defines the module for the package. That file has to use @from to import all the symbols that you want to export from your package. So it ends up looking about the same as just including a bunch of files:

github.com

MilesCranmer/SymbolicRegression.jl/blob/d7ac80267e1e4a4cbdac66bf76f3c7f8cb5d81c6/src/SymbolicRegression.jl#L62


      
          
          
using Distributed
          import JSON3
          using Printf: @printf, @sprintf
          using Pkg
          using Random: seed!, shuffle!
          using FromFile
          using Reexport
          @reexport using LossFunctions
          
          
@from "Core.jl" import CONST_TYPE, maxdegree, BATCH_DIM, FEATURE_DIM, RecordType, Dataset, Node, copyNode, Options, plus, sub, mult, square, cube, pow, div, log_abs, log2_abs, log10_abs, log1p_abs, sqrt_abs, acosh_abs, neg, greater, greater, relu, logical_or, logical_and, gamma, erf, erfc, atanh_clip
          @from "Utils.jl" import debug, debug_inline, is_anonymous_function, recursive_merge
          @from "EquationUtils.jl" import countNodes, printTree, stringTree
          @from "EvaluateEquation.jl" import evalTreeArray, differentiableEvalTreeArray
          @from "CheckConstraints.jl" import check_constraints
          @from "MutationFunctions.jl" import genRandomTree
          @from "LossFunctions.jl" import EvalLoss, Loss, scoreFunc
          @from "PopMember.jl" import PopMember, copyPopMember
          @from "Population.jl" import Population, bestSubPop, record_population
          @from "HallOfFame.jl" import HallOfFame, calculateParetoFrontier, string_dominating_pareto_curve
          @from "SingleIteration.jl" import SRCycle, OptimizeAndSimplifyPopulation

Topic		Replies	Views
[ANN] PatModules.jl: a better module system for Julia Package Announcements code-organization	70	9333	January 4, 2021
Best way to structure Julia code General Usage question	17	4578	July 31, 2019
Julia Modules Internals & Design module , code-organization	20	5157	November 17, 2017
Loading User Modules in wrong order causes a freeze without errors General Usage question	9	502	December 20, 2018
Loading modules at the REPL in 0.7 - is it different? General Usage	3	772	November 15, 2018

Implicitly loaded modules in the future?

Related topics