I am trying to understand this documentation page which is related to code loading in Julia.
Under the Environment section, there are two types of environment listed.
Project Environment
Package Directory
I don’t understand what the difference between these is exactly, or why the difference might be important.
My present understanding, which may be wrong, is that a Package Directory is a directory containing a set of Project Environments.
I have come to this conclusion based on what is written here:
A project environment is a directory with a project file and an optional manifest file, and forms an explicit environment. The project file determines what the names and identities of the direct dependencies of a project are. The manifest file, if present, gives a complete dependency graph, including all direct and indirect dependencies, exact versions of each dependency, and sufficient information to locate and load the correct version.
A package directory is a directory containing the source trees of a set of packages as subdirectories, and forms an implicit environment. If X is a subdirectory of a package directory and X/src/X.jl exists, then the package X is available in the package directory environment and X/src/X.jl is the source file by which it is loaded.
My interpretation of this is that 2 is a directory containing one or more 1’s.
Possibly a difference might be that packages do not have to have the project file and manifest file. Even if this is the case, and I’m not certain that this is correct, I don’t understand why this should be important.
Would anyone be able to explain further, possibly with an example of each structure? (The directories and files contained within each.)
A project environment just means any directory with a Project.toml file (including a package directory), and it may contain many other unrelated files too. If you activate this environment, then the packages listed in the Project.toml file will become available to load with import or using.
A package directory defines a package and contains the source code for it. If you were to download a Julia package from GitHub, the download would be a package directory. If you activate this environment, the packages listed in the Project.toml file become available in the same way as above. (These are the dependencies of the package.) Additionally, in this case, the package itself becomes available to load too and there is a header with package info in the Project.toml file.
Generally, you will be working in a standard project environment unless you are developing your own package. Note that there are also shared environments (like the default global environment named with the Julia version number) that can be used to avoid saving Project.toml files in different directories.
This is all very subtle, with words like “project”, “package”, “environment” and “directory” taking slightly different meanings depending on the context. But I’d say the above two quotes are slightly incorrect. Let me try and explain my understanding.
First, I find that the glossary in Pkg.jl’s documentation gives useful and precise definitions of the terms used throughout the Julia documentation. To summarize, I’d say that:
a project is any source tree following the “standard layout” and containing a Project.toml file declaring its dependencies. I’d say a defining feature is that a project can depend on 3rd party julia code.
a package is a particular project that can itself be used as a dependency by other project.
As such the Project.toml file of a package needs to contain some metadata (e.g a UUID) that may be missing in a project that is not a package. In practice most projects actually are packages (i.e. they have their own UUID and such), even when they aren’t meant to be re-used as a dependency by other projects. This is because most projects are created by tools like Pkg.generate or PkgTemplates.jl, which will automatically take care of giving the project a name and UUID of its own.
I think it’s now easier to explain what goes on with environments. The two kinds of environments are referred to as “project environment” and “package directory” in the code loading section of the manual, but Pkg.jl’s glossary respectively refers to them as “explicit” and “implicit” environments ; I find the latter terminology to be easier to understand.
A “project environment” (or “explicit environment”) is an environment in which a Project.toml file explicitly gives the mapping between a dependency name (i.e. what you put after using or import) and a piece of code to load. (And this is true whether the Project.toml file defines a project that is package or not)
A “package directory” (but I prefer the “implicit environment” terminology) is an environment in which the mapping between name and code is implicitly defined by the subdirectory structure, i.e. using Foo loads the code located in Foo/src/Foo.jl. Note that in this case, there may or may not exist a Foo/Project.toml. If there is one, it may or may not define a UUID for Foo. That is to say, Foo may or may not be a package.
And as you probably understood already, several environments can be combined together to form an “environment stack”.
So let’s imagine I just created a new Foo package that depends on Example.jl:
/tmp> julia -e 'import Pkg; Pkg.generate("Foo")'
Generating project Foo:
Foo/Project.toml
Foo/src/Foo.jl
/tmp> cd Foo
/tmp/Foo> julia --project -e 'import Pkg; Pkg.add("Example")'
Updating registry at `~/.julia/registries/General.toml`
Resolving package versions...
Installed Example ─ v0.5.5
Updating `/tmp/Foo/Project.toml`
[7876af07] + Example v0.5.5
Updating `/tmp/Foo/Manifest.toml`
[7876af07] + Example v0.5.5
Precompiling project...
2 dependencies successfully precompiled in 1 seconds
Setting this project as the “home environment” gives us a stack with 3 environments:
/tmp/Foo> julia --project -e "foreach(println, Base.load_path())"
/tmp/Foo/Project.toml
~/.julia/environments/v1.10/Project.toml
~/.julia/juliaup/julia-1.10.6+0.x64.linux.gnu/share/julia/stdlib/v1.10
The first environment in the stack is explicit (or a “project environment”): it explicitly defines a mapping between using Example and the code of Example.jl in version 0.5.5
Note that the fact that Foo itself is a full package isn’t important here, as illustrated by the second environment in the stack. This one is also explicit, but is a bit different in that it doesn’t adopt the “standard layout” of a full project / package: it’s merely a Project.toml+Manifest.toml pair:
/tmp> tree ~/.julia/environments/v1.10/
~/.julia/environments/v1.10/
├── Manifest.toml
└── Project.toml # <-- if you look into this you'll see it does not
# define a package: it has neither name nor UUID
Now an example of an implicit environment (or a “package directory”) would be the third entry in the stack: this is how the standard libraries shipped with Julia are organized. In this environment, using ArgTools loads the code in ArgTools/src/ArgTools.jl
/tmp> tree ~/.julia/juliaup/julia-1.10.6+0.x64.linux.gnu/share/julia/stdlib/v1.10
~/.julia/juliaup/julia-1.10.6+0.x64.linux.gnu/share/julia/stdlib/v1.10
├── ArgTools
│ ├── Project.toml # <-- It so happens that there is a complete Project.toml
│ └── src # with a UUID, but it could as well not be here
│ └── ArgTools.jl
├── Artifacts
│ ├── Project.toml
│ └── src
│ └── Artifacts.jl
[...]
Oh and maybe a word of caution: I’d say it’s not terribly popular these days for normal Julia users to create “package directories”.
If you develop a bunch of libraries that are meant to be used in another project, I’d say the recommended way would be to structure each library to be a real package, and then to Pkg.develop each one of them into the project that needs to use them. This way you only deal with explicit environments and explicitly declared dependencies. (Rather than putting all libraries in the same top directory and tweaking LOAD_PATH to add this directory to the stack)
Ah I’ve just realized. This means if you have a subdirectory called DataFrames this could potentially conflict with the package DataFrames which would otherwise be downloaded from the pkg repository.
Yes, that’s the idea with “package directories”: everything is determined by the subdirectory name, there is no attempt at disambiguating identical names with UUIDs or choosing a specific version.
I think the first environment in the stack that knows about the name you provided.
/path1/Project.toml containing a [deps] line referring to DataFrames (i.e. an explicit environment knowing about DataFrames), and
/path2 containing a DataFrames/src/DataFrames.jl source file (i.e. an implicit environment also knowing about the DataFrames name),
then
julia> LOAD_PATH
3-element Vector{String}:
"/path1/Project.toml"
"/path2"
"@stdlib"
julia> using DataFrames
will refer to DataFrames coming from the registry, downloaded by Pkg in the version determined by /path1/Manifest.toml, and automatically stored somewhere under the ~/.julia directory.
Whereas
julia> LOAD_PATH
3-element Vector{String}:
"/path2"
"/path1/Project.toml"
"@stdlib"
julia> using DataFrames
will refer to whatever happens to be in /path2/DataFrames/src/DataFrames.jl.