A slight flaw in the Project/Manifest/Env logic?

This post may seem long, but its a very step-by-step process and very easy to digest.

Lets say I have developed a package (or more of a scientific simulation model) that depends on certain DataFrames, v0.20.2. The Project.toml now has a

[deps]
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"

and the Manifest.toml dictates the exact version I added

[[DataFrames]]
deps = ["CategoricalArrays", "Compat", "DataAPI", "Future", "InvertedIndices", "IteratorInterfaceExtensions", "Missings", "PooledArrays", "Printf", "REPL", "Reexport", "SortingAlgorithms", "Statistics", "TableTraits", "Tables", "Unicode"]
git-tree-sha1 = "7d5bf815cc0b30253e3486e8ce2b93bf9d0faff6"
repo-rev = "v0.20.2"
repo-url = "https://github.com/JuliaData/DataFrames.jl.git"
uuid = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
version = "0.20.2"

Now suppose a scientist wants to run my simulation model (call it sci-proj). So they create a new folder, activate a new environment, and run the command: (sci-proj) pkg> dev mypkg (notice the dev for this example, ideally it would be add [github link].

But look what it adds!

(sci-proj) pkg> dev mypkg
Path `/home/affans/.julia/dev/mypkg` exists and looks like the correct package. Using existing path.
  Resolving package versions...
   Updating `~/sci-proj/Project.toml`
  [f88dc3b7] + mypkg v0.1.0 [`~/.julia/dev/mypkg`]
   Updating `~/sci-proj/Manifest.toml`
  [a93c6f00] + DataFrames v0.21.2

Why does it add DataFrames v0.21.2 when the Project/Manifest of the original package mypkg clearly dictates 0.20.2?

Even if I pin DataFrames in the original package, i.e.

(mypkg) pkg> pin DataFrames
   Updating `~/.julia/dev/mypkg/Project.toml`
  [a93c6f00] ~ DataFrames v0.20.2 #v0.20.2 (https://github.com/JuliaData/DataFrames.jl.git) ⇒ v0.20.2 #v0.20.2 (https://github.com/JuliaData/DataFrames.jl.git) ⚲
   Updating `~/.julia/dev/mypkg/Manifest.toml`
  [a93c6f00] ~ DataFrames v0.20.2 #v0.20.2 (https://github.com/JuliaData/DataFrames.jl.git) ⇒ v0.20.2 #v0.20.2 (https://github.com/JuliaData/DataFrames.jl.git) ⚲

I still get the newest version in my sci-proj?

(sci-proj) pkg> dev mypkg
Path `/home/affans/.julia/dev/mypkg` exists and looks like the correct package. Using existing path.
  Resolving package versions...
   Updating `~/sci-proj/Project.toml`
  [f88dc3b7] + mypkg v0.1.0 [`~/.julia/dev/mypkg`]
   Updating `~/sci-proj/Manifest.toml`
  [a93c6f00] + DataFrames v0.21.2

The only way to really fix this is to add a [compat]. So in the Project.toml of mypkg, I add

[compat]
DataFrames = "0.20.2"

and now finally, when I add this project to sci-proj, I get

(sci-proj) pkg> dev mypkg
Path `/home/affans/.julia/dev/mypkg` exists and looks like the correct package. Using existing path.
  Resolving package versions...
   Updating `~/sci-proj/Project.toml`
  [f88dc3b7] + mypkg v0.1.0 [`~/.julia/dev/mypkg`]
   Updating `~/sci-proj/Manifest.toml`
  [a93c6f00] + DataFrames v0.20.2

I feel like add (or equivalently dev) only looks at Project.toml files? Since the Project.toml file says there is a dependency on DataFrames, it basically just pulls in the latest one. I can get around this by using [compat] but why can’t it look at the combination of Project/Manifest.toml?

I thought about this when publishing a recent paper with my model. I published the paper using DataFrames#v0.20.2 and at time of publication, I created a git tag. Users should be able to checkout the tag in ten years, and be able to get DataFrames#v0.20.2 to reproduce the results. Note that since publication, I’ve since updated the model, added new algorithms and functionality and even updated my DataFrames dependency to the latest version. But still… that code at that specific tag works with 0.20.2 and has no guarantee it will work with later versions.

How do people solve this problem? Do I have to add a [compat] line everytime I wrap up a paper, check in/tag the updated Project.toml file, and then remove the [compat] to continue working? This seems … not right.

** I also just realized that my approach is flawed as well. Because I don’t check/commit/track my Manifest.toml file, any one who adds my project in 10 years, will get the latest version of DataFrames.

If somebody wants to use your project they do not need to add or dev it if you create it correctly. It does not have to be a Julia package. You should commit the Project.toml and Manifest.toml files. They then only need to activate and instantiate the project and there should be no issues. It will run as it did when you created it.

See https://github.com/JuliaDynamics/DrWatson.jl which is made for this purpose.

2 Likes

I am infact using DrWatson to setup a science project. It created an environment for me. I added my simulation model (which has its own Project/Manifest.toml files, which dictate the version of Dataframes to be used is 0.20.2). Except when I add my simulation model in my science project environment, it pulls in DataFrames 0.21.2.

The only way to get around that is by using compat, but I don’t think this is the right approach. I may be missing something.

No, they should set your project as the active one (eg. by doing julia --project in that folder or by using activate in the Pkg REPL). Then they run instantiate to install all the packages in the manifest and then they run your code.

The manifest is only relevant if that project is your active project.

3 Likes

Hmm okay. The simulation model is a bit tricky for the users to run, which is why I’ve set it up as a package and exposed an API.

I was hoping the end user can have the following workflow

  1. create sci-project (i.e. new environment), for example using Dr Watson

  2. add my simulation model as a dependency by using
    -> add mypkg#git-tag-1
    -> this pulls a specific version of the code including Project.toml/Manifest.toml
    -> the dependencies it pulls in should be whats given in the mypkg Project/Manifest

  3. run the main simulation function after setting up their parameters. This will reproduce the figures in the published paper.

Actually it’s even far more complicated for me. I have different branches/tags in my code that correspond to different projects. For example, I am using a old version of the model to update results for a company I worked for, so I have a script that checks out that branch and runs the simulations. I was hoping that when I checkout that branch, it should use the right versions of the packages also.

That’s totally fine, just tell the user to activate the project of the package, import the package and use the API.

Yes it will since the manifest will be different on that branch.

Okay, I think I am getting it now… but lets say you are reading my paper and you want to reproduce my results. The code that produced these results are from a month ago, and uses different versions of the packages than the latest master. The tag is called mytag.

So you open up Julia and create a new environment. In this new environment, you run

my-new-environ > add github.com/mypkg#mytag

Even though the Manifest file at that tag says to download v0.20.2 of DataFrames, the Pkg manager actually retrieves the latest version (i.e. my original post).

So how can i tell the user to activate the project of the package itself?

1 Like

Here’s two ideas for what your install instructions could be that gets you that:

$ git clone -b mytag github.com/mypkg
$ cd mypkg
$ julia --project=. 
pkg> instantiate

or:

pkg> dev --local github.com/mypkg#mytag
pkg> activate dev/mypkg
pkg> instantiate
6 Likes