What does `include` do?

There have been questions about this before, but so far none of them have provided an answer at the level of detail I have been looking for.

My current understanding is that include is not a simple text substitution/replacement operation, which is how the C/C++ preprocessor works.

Those languages simply substitute the line include("myfile.c") with the contents of myfile.c whenever such an include statement is encountered.

Actually, in C/C++ the statement is #include "myfile.c" but you get the point.

From reading elsewhere, it looks to me a bit like include parses the target file and generates an AST.

The question I really wanted to answer relates to a problem which occurs with include in C/C++.

In these languages, it is possible to #include a file from multiple other files. This creates multiple definition errors, because from the point of view of the compiler, multiple copies of code containing the same names have been defined.

You are not allowed to define struct A in one place and then struct A in another place, and then try to run the linker to link the two bits of object code together.

Does this same problem occur in Julia? What does Julia do to solve this problem, or if it doesn’t exist, then why isn’t this a problem to consider?

The include statement evaluates the given file in the context of the current module. This is very subtly different from just “pasting the content of the file at the location of the include statement”. If the include statement is at the top level of a module, then it is essentially equivalent to pasting the contents of the file. And indeed, that is how include is generally supposed to be used. What does not work, however, is using include inside the body of a function. That’s an example where the understanding of include not just pasting in code becomes relevant.

There are also versions of include that process the AST of the file before evaluating it, and that evaluate it in a different module than the current one. Those are pretty rarely used, though.

It is definitely possible to include the same file in multiple locations in Julia, just not recommended. If the includes are in the same module, then the second include will overwrite variables/functions that were injected into the current module from the first include. This will typically cause Julia to emit some warnings about overwritten methods. And, of course, it’s probably not something you want to do (what would be the point?).

Including the same file in different modules isn’t that much of a problem, but then you create multiple functions, e.g., A.f, and B.f that are different objects, but run the same code. Again, probably not something you want, but not a “problem” as such, the way you describe in C/C++. There are situations where multiples includes can be useful. For example, in QuantumControl.jl, I include a reeport.jl file in multiple submodules of the package. This is a private helper function (a patched version of ReExport.jl) that I need inside each module.

Generally, though, including the same file in multiple places has a very high chance of confusing both you and your tools (e.g., VSCode). So you’ll want to avoid it unless you have a very good reason to do it.

9 Likes

Thanks for clarifying, I see that I didn’t quite understand the situation accurately.

Ok this gives me something to think about.

For context:

I used to write a lot of C++, and typically as code bases became larger, it could sometimes be difficult to get the compiler to compile your code because of this multiple definition problem. Or, to put it perhaps more accurately, sometimes you might want to do something a certain way, you might want a particular structure in terms of what code lives in which files, and you might be prevented from doing so.

I’ve got to be honest, it has been quite a few years since I wrote a line of C++, so I won’t try and provide an example. It’s not trivial to conjure such a situation.

The reason for asking is I’m interested whether the same issues can arise in Julia. I suspect the answer is no due to a number of factors. It isn’t easy to explain why and I’m not certain my thoughts on this are correct right now.

One factor which is important is that in C++, the compile stages and link stages are independent. In addition, you can declare a function in one place and then define it elsewhere. I think these additional factors cause the problem, and Julia not having them means there is no problem.

I could be mistaken. I’d have to think about it and think back and remember more details of how the C++ compiler works.

Concretely, there are two common idioms that allow you to access the contents of a given file, say utils.jl, in multiple places. In both cases, you want to explicitly have utils.jl create a module:

module Utils
struct A
    # whatever...
end
# more stuff
end

Now you can either:

  • Explicitly include("utils.jl") once and then in the files you need to use it (which are themselves often included in that same top-level file), you can say using .Utils: A. Note the .-prefix! This is just saying that there’s a module defined in my current namespace, and I want to access something from it. If you’re working inside another module, you’d use two dots — using ..Utils — to say that you want the Utils module from the “parent” namespace.
  • Structure utils.jl itself into a proper package and track it in your project/manifest. Now you can just say using Utils (no dots!) and the package manager will handle loading it only once.
2 Likes

Since you mention this, it is perhaps worth adding the following comment.

When it comes to application development rather than package (library) development, I haven’t get come accustomed to how Julia expects me to work.

An application is typically a program with an entry point which uses a collection of libraries. It seems like each of those libraries should be a package, but perhaps not. Perhaps the whole application should be a package.

Julia is sufficiently different from Python that I’m fairly confident the ideas from Python do not transfer so well to Julia.

For example, to write an application in Python, one of the best ways to structure code is to build a single package, and run python in module mode with python3 -m module_name.

I’m not sure if you exactly the point I am making here, it isn’t that common to see people do this. Most people just use a single python script as the entry point.

This doesn’t fit naturally with how Python expects you to structure your code.

It took me ages to figure this out, if you know of any good resources regarding designing structure for Julia projects, please do let me know. It would be great to absorb more information.

I think the key difference between Julia and classic compiled languages like C/C++ (and, to some extent, also Python) is how dynamic the compilation process is.

My mental model for Julia is this: You open up a Julia process, which begins with an empty module Main. Julia then starts to evaluate statements in the context of Main, either from a .jl file (if you ran julia file.jl), or lines you type into the REPL. Any line adds types/functions/constants/submodules to Main, or runs a function. Functions are just names, pointing to a method table. Code loading can be via include, using, or eval. These can affect definitions in the current module or any sub-module, and can add to or overwrite the methods table of any existing functions. Julia keeps track of which functions call which other functions, and re-compiles as necessary if any method table changes.

So, unlike in C/C++, neither files nor modules are things that are compiled as independent entities. Packages do get compiled, but this is pre-compilation, so you don’t have to recompile everything every time you start a new Julia process and start loading packages. It’s an important part of how Julia works, but doesn’t really affect the mental model.

There also isn’t really anything like “application development” in Julia. It’s always just a Julia process, dynamically compiling stuff as it comes in, potentially exploiting cached compilation / pre-compilation. This might change a bit with the juliac static compilation work that’s being done at the moment. I haven’t really looked too much into that, but I think it basically only adds a specific entry point, so that everything that julia does from that entry point can be written to a static executable (while, as of recently, massively stripping out unreachable code to get the resulting binary to a reasonably small size). For now, though, or for “traditional” Julia usage, you should be thinking in terms of a REPL that dynamically compiles code in the background, rather than C-style compiling and linking.

6 Likes

Such -m switch is already there in 1.12-DEV, with same (or similar?) semantics. Note, in Julia packages (and/or modules?) are precompiled, and scripts are not (by default). See also juliac also available in 1.12. There’s also PackageCompiler.jl and more tools to compile Julia binary executables (or libraries), working since before 1.10.

See open issue there, and from it merged PR:

Every thing you said is great EXCEPT the bit that Julia 1.12 is NOT officially out yet!

You can hardly expect newbies to use it (Julia 1.12) for PRODUCTION work.

Regards
Steven Siew

I didn’t say use 1.12 for production (I’ve now edited to 1.12-DEV). I just got reminded of -m in 1.12, and I explicitly mentioned what works for sure in (currently supported) 1.10 and 1.11, since the rest is new.

It wasn’t too clear I was answering newbies, and this thread will exist for long after 1.12 is released. [I think, but not sure, juliac can be downloaded and used for older 1.11 Julia, but yes, consider it a brand-new/experimental, for now; if you try it out, might as well stay with 1.12-DEV, the nightly. It IS good to learn about the upcoming stuff, just not use in production until released blessed as stable.]

Still interesting to know about this, always good to keep ahead of the curve

I spoke with a friend of mine who still does C++ on a day to day basis. We came to the same conclusion that this is the real source of the multiple definition issues you can get with C++.

We came to the conclusion it’s because there is a separate compile and link phase, and each source code file is compiled first independently from all others in what is called a single translation unit.

Some more details for those interested.

In C++/C, you have source code files and header files. They are (supposed to be) treated differently. In header files you place things like function signatures which tells the compiler the actual function definition or body of the function will be provided at some later point and that everything will be linked together by the linker later.

The compiler takes your source code, which is a .c/.cpp file and converts it into object code, which the linker will later use to produce an executable. This single source file is called a translation unit.

The problem comes with templates. The compiler sometimes needs to see the exact code which will be compiled to instantiate (create) an instance of a templated function. This is where both of our memories become a bit vague. It’s difficult to understand what specifically has gone wrong without seeing a concrete example.

But, the point is that this sometimes forces you to put the body of functions in a header file. If that header file is #include’d in more than one translation unit then you end up with a multiple definition, and have to re-think your design.

Often the easiest solution is consolidate several files into one to maintain a single translation unit. But that can be a pain from a maintenance/repository structure point of view. (Few very large files.)

We (now strongly) suspect Julia doesn’t have the same problem. It seems like this problem occurs due to the design of data flow through the C++ compiler pipeline, rather than specifically being a language design problem.

I suppose my question is really only of interest to anyone who worked with C++ in the past. If you never used C++ before Julia you can basically ignore this. The TL;DR seems to be that C++ has a problem with the design of how it compiles code which Julia doesn’t have.

Actually - I just had a further thought about this.

I believe I am correct in thinking that Julia can sometimes be made to print a warning message about replacing one existing module(?)/function(?) name with a new definition?

I think that parallels the C++ problem. The C++ compiler will reject your code if you end up in this situation. I think Julia accepts it and just replaces an older definition with a new one.

Is anyone able to provide an example of this behavior? I don’t know enough about Julia to replicate it for myself yet.

You mean like this?

julia> module A end
Main.A

julia> module A end
WARNING: replacing module A.
Main.A

It’s not delineated how you’re thinking. The problem is reassigning “constant” variables, even ones that are implicitly so:

julia> A = 3
ERROR: invalid redefinition of constant Main.A
Stacktrace:
 [1] top-level scope
   @ REPL[27]:1

julia> const x=1
1

julia> x=2
WARNING: redefinition of constant Main.x. This may fail, cause incorrect answers, or produce other errors.
2

julia> x = 3.5
ERROR: invalid redefinition of constant Main.x
Stacktrace:
 [1] top-level scope
   @ REPL[31]:1

julia> struct X end

julia> struct X x::Int end
ERROR: invalid redefinition of constant Main.X
Stacktrace:
 [1] top-level scope
   @ REPL[3]:1

So for the most part, you’ll get an error because it’s dangerous when constants aren’t constant, especially when it messes with the type system. You’re allowed to make changes with a warning if you reassign to an instance of the same type except for type-types, but you’re also accepting the risk of previously defined and compiled code still using the obsolete instances, including entire modules.

It’s worth highlighting when it’s not a reassignment.

julia> foo() = 0
foo (generic function with 1 method)

julia> foo(x) = 1
foo (generic function with 2 methods)

julia> struct Z end

julia> struct Z
         Z() = 0
       end

In both cases, the subsequent definition does not reassign the constant variables, they’re just adding or replacing methods. This is pretty important for interactivity in a multimethods language. When it comes to precompilation of packages, however, you get errors for method overwriting because behavior can easily be determined by an uncontrollable order of package loading. That has been likened before to linker warnings when loading 2 C libraries defining the same symbol, but the exact thread escapes me at the moment.

There are quite a few things you can’t or shouldn’t do during precompilation that are fine afterward, I imagine you’d see a few more parallels to AOT-compiled languages. Note that the linked documentation section will say “modules” when it means “packages” in context; modules evaluated into a Julia session won’t be precompiled so they don’t run into any of those issues.

1 Like

Yes, that’s along the lines of what I was thinking. Does this mean if you have two modules with the same name, you can’t include both at the same “namespace level”.

In Python you can do import X as Y to get around this problem. The problem here however is that in Julia import/using are different statements to include whereas in Python import does both things as a single step.

I would guess Julia provides a solution to this, somehow? I don’t recall reading about it in the docs. I know you can use ModuleName.functionName to distinguish between two different functions with the same name defined in two different modules.

Can the same be done with module names?

This is also a bit off. In both Julia and Python, the code’s environment determines what package (library) is associated with a symbol in imports; import X as Y will only find one package X in the environment, the as Y is not disambiguating anything. as does help disambiguate different objects in different modules that happen to use the same symbol, e.g. from X import foo as bar and from Y import foo as baz. Note that X.foo and Y.foo are still usable; there’s no issue with the same symbol in 2 namespaces having different roles, it’s only a conflict when a 3rd namespace tries to import those 2 roles with the same 1 symbol.

julia> module A
         module B end
         module C end
       end
Main.A

julia> module D
         using ..A: B # D borrows A.B
         module C end # different module
       end
Main.D

julia> using .A: B

julia> using .D: B # allowed because same home module and symbol

julia> using .A: C

julia> using .D: C
WARNING: ignoring conflicting import of D.C into Main

julia> C
Main.A.C

julia> using .D: C as C2

Again a bit off, so it might be worth breaking down what’s happening. Julia’s include evaluates source code in a file; it goes through the whole process of text → AST → runtime objects. The same way that evaluating println("hello, world") 2 times prints 2 lines, evaluating a file containing a module expression makes 2 modules; doing so in the same namespace is a discouraged conflict. In both Python and Julia, imports let different modules (namespaces) trade objects via symbols and renames. A Julia module could either be a package installed into the environment or a previously evaluated module expression within the same session or package source; you can tell the difference at the import because the latter has relative dots. Python has similar packages, but its other modules are instead encapsulated by files. The session’s first import of a package or Python file usually loads compiled code; evaluating source is only needed if the compiled code is outdated. Subsequent imports in the session are just making references to the loaded code.

Things in interactive-first languages tend to work very differently from AOT-compilation-first languages, it’s reasonably difficult to draw parallels.

Just a small note but when you say

That is true for c and often for C++. But if templates are being used, all code goes inside the header files usually.

Julia doesn’t run into this problem, because you don’t need to include source in order to instantiate parametric types or specialize generic functions. You just import the module.

In consequence, it’s rare in Julia to include a file more than once. You typically structure your code into one or more packages, which is broken up into one or more files and sub-modules, and each file is included only a single time — include is used more like a Makefile to tell Julia what code makes up the package, in what order. You re-use code by importing modules (either submodules or other packages), not by include-ing files.

3 Likes

include trivia: if include is a normal Julia function, how can it possibly figure out which module it was called from?

The trick is that each module M implicitly contains an include(file) = Base.include(M, file) definition. This means that while A.sin == B.sin is true, A.include == B.include is false (same for eval). There are hundreds of distinct include-named functions in a typical julia process.

(sorry if this is mildly OT!)

2 Likes

Except for Core and modules defined with the baremodule block such as Base. The docstring for baremodule actually points out module blocks automatically define a personal include to shadow Base.include, though it’s inaccurate because it only illustrates one include(p) method when there is a 2nd include(mapexpr, x) method.

This actually throws me off sometimes when I reach for include_string and realize I need to provide a module.

Elsewhere, I was told that this isn’t what Julia’s include does. I was told that it does a bit more sophisticated version of the C++ preprocessor text substitution of #include. Let me see if I can find that.

It may have been here actually, I’ve somewhat lost track.