Organization of multiple modules in same package

I’m developing a package which defines a number of types. Some types are helpers and some are specific implementations (parametric or abstract) of other types from this package.

I’ve been poking around popular packages to see how they might handle this. I’ve seen different styles. I’m all about extensible patterns, so if it’s not extensible gracefully I’m a little annoyed.

The most common package looks like this: One module defining more general/generic type(s) or interface(s) and including more specific code via direct file inclusion. The “module” file exports stuff for the included files.

module Foos

export 
Foo,
bar
    
struct Foo{T}
  x::T
end

include("bar.jl") 

foo(x::Foo{T}) where T =  x.x+x.x

end

bar.jl

foo(x::Foo{String}) =  x.x*x.x

bar() = "hey"

This is really just breaking a large Foos.jl file into multiple files. Evidence of that is that Foos.jl exports bar.jl functions/types.

To extend this pattern to include more types/interfaces in modules I’d need one package per and compose via using. If they are very small types/interfaces, I don’t know that I really want to break them up into multiple packages.

So I’ve seen this kind of pattern for multiple modules in a single package.

Foos.jl

module Foos
include("Bars.jl")
using .Bars

Problem here is I can only do that for Bars once inside Foos tree of includes. So if Foos includes Bars and Bazs, but Bars and Bazs want to both use a module Bats I get an error. This isn’t very extensible.

In my specific case Bats should really be it’s own package. However, I could see other cases where Bats was a utility module that only made sense in the Foos context. Where it should probably not be a module I suppose and just included and I trade an error for some warnings when Bats functions etc are redefined.

Seems I’m limited to linear or flat include/using relationships between files and modules in one package. Breaking packages up anytime I need to go beyond this seems the only sensible way. If I want to start off extensible that means I should always start a new package when I have a new type/interface. This seems impractical.

Any suggestions for more extensible patterns for large projects where I can have multiple modules in one package?

1 Like

Perhaps I am misunderstanding, but I don’t see defining dozens of modules within a package as a problem (FinEtools).
See FinEtools.jl, which groups multiple modules under one umbrella.

2 Likes

I didn’t intend to say it’s a problem, just a not known solution to me and a quick peek around didn’t satisfy me. Julia provides lots of building blocks and it’s still young, so I didn’t doubt there were good solutions possible.

FinEtools organization looks great. I was messing around a bit and started converging on something quite close to what you have there. The top level FinEtools.jl files is serving as a kind of project file. I have some questions still, but I’ll study this first.

Thanks!

A trap I fell into here was treating modules like C++ classes/files. You’d usually organize your include pattern top down in C++, but with the modules it appears you actually want to have the modules used by other modules to appear first. Otherwise you get LoadErrors… oh yeah, isn’t this what include guards are there for in C++? Wonder if something similar would be worth doing in julia.

Maybe this could be of use https://github.com/simonster/Reexport.jl?

Alright I messed around with the idea of emulating the C/C++ convention.

In C/C++ you would #include everything in a tree-like manner. Which is nice because it typically would mirror how the files are laid out on disk, and if you are doing OOP mirrors your inheritance relationships. Then you use a dirty little preprocessor macro to guard against including a file twice. Since all those nested includes are really just cutting a pasting into a top level file.

Here’s how you would do it in Julia.
I emulate the application or library environment with a top level module. I wrote a dirty little macro to guard against adding a module twice. Here’s an application built out of 4(+1 ) modules with the Ds module used twice at the “leaf” level.

It would be 5 files, but I did the job of include(“Xs.jl”) cut-and-paste for brevity/clarity.
Note: #include goes before your namespace declarations.
so in julia Bs.jl would look like this

include("Ds.jl")
@once module Bs
....

Application.jl

module Application
  macro once(exp)
    if exp.head == :module
      if !isdefined(__module__, exp.args[2])
        esc(Expr(:toplevel,exp))
      end
    end
  end

  # include("As.jl")
  @once module As
    export A, foo
    abstract type A end
    foo(x::A) = 0
  end
  # end As.jl

  # include("Bs.jl")
  @once module Ds
    struct D
      d::Float64
    end
    bar(x::D) = x.d^2
  end

  @once module Bs
    using ..As: A
    using ..Ds: D
    import ..As.foo
    struct B <: A
      b::Float64
    end
    foo( x::D, y::B ) = x.d*y.b
  end
  # end Bs.jl

  # include("Ds.jl)
  @once module Ds
    struct D
      d::Float64
    end
    bar(x::D) = x.d^2
  end

  @once module Cs
    import ..As.foo
    using ..Ds: D
    foo( x::D ) = x.d
  end
  #end Ds.jl
end

julia> include("App.jl")

julia> Application.As.foo( Application.Ds.D(2.0), Application.Bs.B(10.0))
20.0

I haven’t sorted out exporting, I’ll check out reexport. Thanks for the link!
I’d also like to write using As instead of using …As inside the sub-modules, but I’m ok with that. It’s a bit like #include “As.hpp” instead of #include <As.hpp>

I edited my macro to return an Expr rather than eval directly. Not exactly sure why :toplevel is needed, but it is.

Hope you don’t mind me reviving this old discussion thread. I thought it was interesting since I am also an old C++ developer.

I personally feel that thinking about Julia’s include as C/C++ include will typically just get you into trouble.

In C/C++ to actually use some chunk of code you need to include a file, however that is not necessary in Julia. You simply import a module. Modules already has guards so you avoid the problem of inventing your own method of checking if a module is defined twice.

I think as C++ developers we have a tendency to think in terms of huge monolithic programs with deep nesting. I think a more sensible Julia approach is to simply make multiple separate packages that depend on each other. Instead of putting one huge code base into one repo, just make your app as a smaller package which has dependencies to say 3-4 other packages.

The you avoid complicated module nesting. You get versioning between your packages and you get more clearly defined smaller packages that people can use for other purposes.

I mean we are not in a C++ world with complicated deployment where we want to bundle everything up. We got a great Julia package manager will will pull down everything you need.

And making modules at the granularity similar to a class is not necessary in Julia, or really desirable IMHO. I tend to have multiple types inside one module. A module should have types that are used together and form a coherent set of functionality. A module is like a library, not like a class. That multiple types use functions with the same name is not an issue in Julia. Due to multiple dispatch name collisions is seldom a problem.

A benefit of relying more on the package manager than inventing your own system to manage a large number of submodules is that you can use the package manager to easily list and see dependencies. That gives you a much more convenient way of having an overview of what module depends on what other module.

If you do all this yourself inside one big monolithic package, then all these dependencies are just implied in your code. You have to discover it yourself by looking at the code.

4 Likes

Yes this is what I’ve ended up doing. There are a few things that could be done to make this kind of granularity easier to work with. It’s a minor annoyance I seem to end up with a very long using A,B,C,... line when I want to do something in the REPL.

However, I’m currently registering a chain of A,B,C,D in the general registry and with a 3 day waiting period for each that will take 12 days minimum. This is annoying, but a one time thing.

For packages intended (mainly) for your own use, you could consider maintaining your own registry where you pretty much do what you like (search the forum for discussions).

1 Like

Not the case.

Another revive… by another C++ developer.
Let’s consider the Julia code editors.
If there is no include, just “using …”, the IDE has to look in the entire project in order to give you autocompletion for a specific module, struct etc.
with @Orbots’s solution, the IDEs would be able to provide autocompletition a lot faster.

I don’t really understand the issue you guys have, and I don’t think I even fully understood the problem @Orbots had. I don’t understand how autocomplete is a problem. If you are using some module X, in a file, then I assume there is a “using X” in that file and you will have access to autocomplete for anything in module X.

Is it possible to construct a very minimal code example that illustrates the problem?

Let’s say you have the file a.jl with the following content:

module a_module
    export hello

    using ..c_module

    function hello()
        println("a " * c_module.hello())
    end
end
  1. IDE autocomplete:
    When you type “c_module.” in the above code the IDE has to know where the c_module is defined, therefore it has to scan the entire project to see the c_module full definition. Now imagine you can have several “c_module” in the project, even sub-modules with this name, as this is very possible. How the IDE knows which one is going to be used without an appropriate include?

  2. Let’s say I write the code above and then I leave the project. Somebody else will come and he has to maintain the code and add functionality. In order to quickly understand the code, this new guy has to know where the c_module is located. Without an include, he has to look into each c_module in the project. Isn’t this a waste of time? When I first wrote the code I had a specific c_module into my mind and I could simply indicate it (via include).

I’m not saying it’s impossible to deal with the issues 1 and 2 without the include, I am saying it take more time and it’s often annoying, in my opinion.

I must confess, that I am not sure if I am the confused one or you. Either I am really missing what you are trying to get at or you have some important misunderstandings about how modules and include work.

  1. Why scan everything? You specify exactly where the c_module is. Your project should already be parsed and in the parsed structure you are clearly specifying where the c_module is located. How does it matter that there are several c_modules in your project? You are specifying one particular one. There is no confusion here about which one you have in mind.

  2. No, he doesn’t. You find the module including a_module. It will tell you where the c_module comes from, as the c_module is included in the module enclosing the a_module.

These are not really issues, and they are artificial to begin with. Creating lots of nested modules in Julia strikes me as an anti-pattern. Modules are not classes. A module is like a namespace, they should encompass quite a lot of types and functions. I organize my modules into multiple subdirectories, but I don’t actually create submodules. If my package grows large enough to need that, then I would rather create a separate package, rather than begin nesting submodules.

If you stick all your code into one enormous module, then you make it really hard to reuse that code in other projects. It makes a lot more sense to split off reusable functionality into separate modules/packages which can easily be included in other projects. Not to mention you make it hard for other people to reuse your code.

In fact what I find to be a horrible anti-pattern is the C++ propensity to build mega packages/libraries like Qt, OpenCV, VTK etc. It is extremely cumbersome to mix and match such mega packages/libraries with other libraries. You are forced to do enormous downloads only to get small amounts of functionality. You got to deal with complex build systems.

With the excellent version control and package management that exists in Julia, there should be no need to continue this mega-package habit that exists in the C++ community. Keep packages short and to the point. It makes reuse and maintenance easier IMHO.

2 Likes

There is a cost to maintaining a package. The cost is higher for complex software applications with many developers working on the same codebase. Someone is going to argue there isn’t, but if you can imagine a world outside your own where this is true, you can see there are situations where 1 package > n packages.

I think one gets a bit into philosophical territory there. Even when working on exactly the same piece of software, people tend to have widely different opinions on how to best manage that software. I remember the software I worked on, where I strongly believed in modularizing it while others championed a big monolith design.

But if you don’t believe in modularization, and you want a big monolith, then why have submodules in the first place? Why not just put all the code in one large module?

Or just keep things simple and have one level deep submodule hierarchy like GitHub - PetrKryslUCSD/FinEtools.jl: Finite Element tools in Julia

I don’t get why you would want arbitrary deep nesting. That seems very Java/C# like. Most newer languages seem to prefer shallow over deep hierarchies. Look at Swift, Python or Julia for that matter.

Of course it is none of my business how you organize it, I am just trying to give advice to make your life easier. I have worked with people coming from other languages and seen so often how they complicate their lives by insisting on workflows that make sense in the language or world they come from but doesn’t make sense in the new world.

The craziest example I have seen of this is Java enterprise developer entering mobile phone dev. They work in single user systems, but they have worked all their lives on systems that deal with multiple users, so they immediately grab a database to save the most trivial amounts of data not realizing that a plain old file works just fine on a single user system when deal with small amounts of data.

That is what I would reflect on in this case as well. How much of your imagine organization is predicated by the particularities of C++ vs what you actually need.

5 Likes

You are not.

1 Like

I think this is the same issue discussed in [ANN] PatModules.jl: a better module system for Julia - #38 by patrick-kidger.

4 Likes

Simply because there is no include in the a.jl file, no include for the c_module. It’s really that simple.

How exactly? The IDE has no idea which is your main file where all the includes shall met nicely, it does not have this “root” point. The only thing the IDE sees is an c_module in the a.jl file. Which c_module, well, any of the existing c_module in the already parsed structure.

Your solution clearly illustrates the problem. He has to “find”. This is the waste of time I was talking about.