Hey folks. I’m new to Julia, and loving every aspect of it… except for ways of splitting code over multiple files and folders. There, I swiftly discovered that existing ways just don’t really seem to cut it.
Fundamentally, you have to deal with issues of files being included multiple times. Solutions to this problem (only including files once; using header guards; only including at the top-level) makes tracking dependencies harder to reason about (only including once), entail extra maintenance work (header guards), or pollute namespaces (including at top-level).
It’s said that code is read more than it is written, so easy-to-understand code is important. This definitely isn’t the easy-to-use import system that it could be. (Contrast e.g. Python – whose import system has a couple warts, but does generally get the job done.)
Fortunately, Julia’s incredible metaprogramming offers a solution. A little bit macro magic later, I’ve put together PatModules.jl. (I’m not great at names… .) It’s a deliberately minimal attempt to fix the greatest pain points, without trying to do anything too clever: after macro expansion it should look like normal Julia code. Its core design goal is to make it so that you never have to write include ever again, and in doing so avoid these various issues.
So: thoughts, feedback? Obviously folks have been getting by so far without anything like this, but at least speaking for myself it’s made my code a lot easier to reason about.
My impression is that in practice, very closely related files are typically joined via include. E.g. a module A might look like
end # module
and anyone outside of module A who needs A functionality would/should just using A. But I suppose this is the two-step process you mention in the readme (very well written, by the way), and for complicated modules perhaps the ordering of import statements can be tricky.
I also wonder whether this bit of the manual about project directory environments (from the code loading section) is a partial solution:
A package directory is a directory containing the source trees of a set of packages as subdirectories, and forms an implicit environment . If X is a subdirectory of a package directory and X/src/X.jl exists, then the package X is available in the package directory environment and X/src/X.jl is the source file by which it is loaded.
Though I must say I’ve never been able to get the manual-described behavior to work…
I’d be very curious to hear how more experienced community members than me have managed the include process in large projects, and especially curious to hear language and Pkg designer Stefan’s thoughts on include.
I am glad you have made a package that implements a file-module relationship you like.
I, and others, find the seperation of the file from the module, and include from using to be a feature. “Namespaces are overrated, let’s do less of those”.
And favouring 1 module per package (if it is seperate enough to be put in a submodule, it is seperate enough to go in its own package, and then you have the package manager and semver here to help you with gradual upgrades).
and a bunch of other reasons; that its not really worth going into.
I would be extremely cautious about making claim about “Writing modular, reusable code in Julia is harder than in other languages.”.
I would say that seperating code out into more packages rather than submodules is a much better way to get reusuablity and modularity.
The two biggest ecosystems in julia that I am aware of is (my employer) Invenia’s production system, and DiffEqverse. In both cases we are talking about over 400K lines of code, dozens of seperate contributors and many parallel teams working on different parts and in both cases it is broken up into over 50 packages, with almost no submodules; and with each package generally includeing between 2-12 files.
Anyway, I am please you are making a thing you like.
and this is certainly a cool use of the language, quite the deep dive for your first (?) package.
But I would say you should pitch it more as an alternative that might be more familar to some rather than saying it is better.
Saying it is better is asking for fights.
However, it is pretty common for large packages to consist of many modules. Just two example I read recently: Pkg.jl (included with julia, but still - doesn’t make much sense to split its modules into separate packages), Pluto.jl.
I had the same thought, and I dislike using include. However as many may not notice, this is an issue that has been discussed and passed traige a long time ago, it’s something that just requires a PR: https://github.com/JuliaLang/julia/issues/4600
This gives you something simlar to rust namespace, I’m already happy with 4600.
There shouldn’t be any more design phase on this. The design there is considered compatible to 1.x versions. That’s why it was moved out of the 1.0 release milestone.
I don’t personally like the design of PatModule either however, IMHO it creates some extra noise to the using and import statements, and made things too complicated on using and import statements.
I like the design which have been discussed in issue 4600, and I just hope someone would be the hero on that, or maybe I’ll do that myself at some point.
eventually as long as the feature proposed in 4600 is implemnted, one does not need include any more but just using/import like many other “mordern” languages.
Of course this may not be for everybody, and I’m not here to try and convince anyone to switch. Just making something available for those who may be interested.
Still, for those curious, to give an explicit example of the sort of thing that this tries to fix - what is OrdinaryDiffEqConstantCache here? (From OrdinaryDiffEq.jl.)
It’s just implicitly assumed to exist, but the reader is given no help finding where it comes from. It’s unsurprising to find that this file is included in the wrapping OrdinaryDiffEq.jl, but that doesn’t really help. All we know is that OrdinaryDiffEqConstantCache is something else that’s included in the same file, which prompts a bit of a manhunt.
That’s probably no issue if you’re intimately familiar with the package (maybe you are the developer) and have pretty much the whole thing in your head anyway, but it’s making life pretty difficult for everyone else. This is no bash on Chris/SciML by the way, I continue to be very impressed with the project.
I’m aware of #4600. Honestly I’m not a fan. At the moment include just handles files; import/using just handles modules. Muddying that seems like a mistake. EDIT:#4600’s treatment of files/folders/modules is actually exactly what PatModules does – that’s clearly a good thing to my mind. It’s just the overloading of import/using that seems off to me.
Let me start by saying this is a cool idea and this can be helpful to some people. Welcome to the community. One package that should consider using this is QuantumOptics.jl (@david-pl). @PetrKryslUCSD’s FinETools.jl should consider it too.
That said… there are comments which go beyond helpful to a different territory.
How is having one file and one place to check for all dependencies harder than having to look at the whole package? Localizing information is usually more efficient than globalizing it. These days I would recommend doing import all in one spot of a package so that way it’s easy to, in one glance, know the entire namespace. (Though SciML has generally overused using in the past and should correct that).
As a larger personal point, the packages I’ve worked on where submodules were used (in more than just Julia), they were either unnecessary extra work or trying to hide the fact that the programmer should’ve used more explicit variable names and self-documenting internals. If different functions collide but do different things, then they were named ambiguously and that’s your real problem. That’s a personal point that I could see someone disagreeing with. But seeing packages not using submodules and saying that means there are bad programming practices is a complete misunderstanding of what’s required for maintainable code.
In fact, I would say that’s vehemently not true. In other languages, neural boundary value problems with quaternion states are a big exciting feature you write a package for, and in Julia it’s a footnote of what happens when you stick a neural network from the standard ML package in the standard differential equation package and run the standard autodiff on it (for reference, I know this works because someone shared a result). This has its own interesting issue because means discoverability is hard because it’s purely from composability, so “does feature X exist?” comes down to “did you slam package A against package B and plot the result?”. But I digress.
If you have code that’s so modular that it’s a completely separate entity, why stick that in as a submodule instead of making it its own package? The lack of composability of ecosystems like Python tends to make people write monorepos, where something like PyTorch is a good example that has a JIT, an autodiff library, a nonlinear solver, a linear algebra library, etc. Those aren’t submodules: those are libraries. It’s a design limitation that makes it so that an arbitrary nonlienar solver implementation isn’t compatible with PyTorch tensor objects, otherwise PyTorch would could be mostly discarded and lots of its operations could just be calls to NumPy/SciPy. Note torch.numpy is a separate implementation from the numpy library, and the fact that submodule exists is purely because of this inability to compose the code.
What you see instead with Julia is code composability which has a tendency to split modules. The ODE solver allows CuArrays. The AD works with the ODE solver and the CuArrays. So what you end up is not an ODE solver package with submodules AD and GPU kernels, but you instead have an ODE package, a GPU package, and an AD package. That makes it more like the Linux ecosystem, where it’s about building good standalone composable pieces rather than trying to make one software provide the world.
In the end, if it really is a separated enough idea to say it should be kept in a separate world, shouldn’t it have its own versioning? Its own full set of tests and downstream testing to multiple packages? Won’t it have its own downstream users which might not be the original intended one? To me that’s a package, not a submodule.
@edit sends you right to the line of code. FWIW, that’s an old piece of code from a portion that @YingboMa never really finished and probably should be deleted. The correct way of doing that should be isinplace(integrator) which is pretty explicit. 3 years ago was a pretty different time.
There’s no doubt that OrdinaryDiffEq.jl is huge, but submodules aren’t the answer there but it’s also rather simple. Every year there is a good number of undergraduates who contribute a new method to the package each year with just the help of http://devdocs.sciml.ai/latest/. Good clean structure is what makes code easy to understand. I don’t think adding boilerplate for 50 lines of imports at the top of each file makes code legible. It should just read as a book: each algorithm has a cache and possible tableaus that then defines perform step dispatches over, and the integrator loop runs over those.
Organizing some of the front level into clearer folders and maybe making some naming more explicit would do the job, but I don’t think making every file longer to repeat the same thing is helpful. While I say it should read like a book, I should put a disclaimer that it shouldn’t read like Charles Dickens.
Because it may not make sense, making some internal functions available as a package takes much more effort comparing to use it internally at the beginning. The package Yao has been a monorepo for more than a year, I splitted everything out later. And for Comonicon it is still a monorepo of parsers, codegen, CLI building tools, static compile tools. But why I haven’t put them into packages? Simply because I don’t have time yet and it’s always simpler and more convenient to put it into a module for now.
Or should we just disable the ability to use multi-hierachy modules in Julia at all if this is not a good practice? Apparently not.
And I have mentioned, the feature implemented in this package is well discussed in issue 4600. It just requires someone to work on it.
PyTorch is being in monorepo for other reasons, not because you can’t split them out in Python. In fact, a lot Python packages split things out and use a meta-package as we do in Julia, e.g qiskit I don’t think this is something that is an actual issue for Python.
@patrick-kidger Welcome to the Julia community! Thanks for diving in with a new package. I just have one comment in response to this:
Yes, it can be a little tricky to figure out where things are coming from sometimes, but this can be alleviated by IDE tools. For example, in VS Code there are “Peek Definition” and “Go to Definition” commands that will show you or take you to the definition of the object of interest. And as @ChrisRackauckas mentioned, there are also macros that you can use in the REPL like @less and @edit.
This is specifically about doing “relative” includes/imports to code within your package. I agree importing outside packages is standard / fine.
Never said that. Indeed writing this out in normal Julia w/ modules would be easy to get wrong. The use of modules in PatModules.jl is just a convenient way to namespace things.
The discussion related to #4600 is actually pretty interesting. Other than the fact that it’s overloading the import/using statements, it seems to be proposing to do exactly what PatModules actually does: have modules, packages, and files all share names, and to import module and include file simultaneously. I think if you’re a fan of one you’ll probably be a fan of the other.
Indeed, I’ve got no interest in registering a package that I put all my models in, and another package for my training loop, another package for my datasets, and so on. (Nor is putting them together in one blob a good way to manage the boundaries between these different subsystems.)
I’m not convinced that relying on IDEs is a good way to get around concerns with the language…
I see. Well, this is one of the things that I believe improve the access to the logic (readability of the code). The top module is the place where the reader can find where each function and type comes from.
It is also the only place where exports occur. Again, improving legibility.
By the way, one of the reasons that Julia doesn’t use namespaces as aggressively as Python is because we group related functionality into generic functions. Instead of having List.map, String.map, Tuple.map, etc, we just have one generic function map. The emphasis is on overloading generic functions rather than putting slightly different versions of functions in separate modules. In order to fully take advantage of multiple dispatch and function overloading, you want a pretty flat namespace.
I think one takeaway from this thread is that sometimes it’s good to solicit feedback from the community before embarking on a new project, rather than after. It’s not uncommon for new Julia users to come to this forum and say, “You’re doing it wrong.” Unsurprisingly, those posts usually receive a bit of pushback.
(The List.map, String.map, Tuple.map example is taken from languages like Erlang and Elm.)
Haha, to be clear, none of what I’m saying is meant to be a “you’re doing it wrong”. I expected a fair amount of pushback when I published this actually, as I can see that the Julia community is nothing if not strongly opinionated.
One thing I particularly like about Julia is that one is the ability to construct these sorts of import systems if necessary. So you can code Julia your way and I can do it mine and we can both be happy. (Admittedly the metaprogramming involved is a bit of a barrier though.)
You may be interested to know about this technique to find this out using the REPL
Though I admit having to load a package to ask where something is defines is a bit of a twist, and if you are just reading the code is kind of annoying.
It’s a fair complaint.
(Though, I personally prefer it to the trade-off of listing all imports. But I understand others disagree)
Just thought i would bring up this “trick” of using the REPL to ask where something is defined even if you don’t know the module, as it is I think an under appreciated.
I’m a bit skeptical of this point–including a file multiple times in Julia is nearly always a mistake. While I completely support the notion of helping users avoid making mistakes, I don’t think this situation is anything like C or C++, where including a file multiple times is normal and required.
Don’t know if anyone has mentioned it though, but PatModules.jl as a name, 10/10 rolls off the tongue. I wouldn’t change that.
Indeed, I think dispatch and functional styles really changes the way that code is written. Explicit function names for functions that are used everywhere means I just generally write code assuming some idea like max is extended to what I’m working on. I don’t really check what exists but just use the functions that do Hard to explain but it’s like the Plots alias system.