How to generate reproducible random numbers across versions via package manifest?

I am trying to ensure that random numbers generated across different Julia versions are reproducible.
I found this part of the documentation:
https://docs.julialang.org/en/v1/stdlib/Random/#Reproducibility

It has a parenthetical remark:

(You can also, of course, specify a particular Julia version and package manifest, especially if you require bit reproducibility.)

Could someone explain what that means in more detail?
Does this mean modifying the [compat] section of Project.toml?
If so, which packages? The julia base package itself? What about Distributions.jl? Others?

When you activate the environment of a project, all installed package versions of that environment are recorded in a Manifest.toml file, which lives next to the Project.toml file. The idea behind that quote from the docs is that you can specify “Here, use this Manifest.toml and this specific julia version” to ensure that other people using your code can reproduce your results exactly by instantiating (]instantiate) your environment.

Another option would be to use the RNGs provided by StableRNGs, which have the express purpose of not breaking, though again you’d have to provide the version (and/or Manifest.toml) you’ve used.

Ideally though your code doesn’t rely on the specific random numbers and has other checks for ensuring correctness (e.g. statistical properties of your algorithm, checking some invariants that have to hold no matter what you pass in as long as its valid input etc.)

4 Likes

This is for reproducing the results of Monte Carlo simulations, so the random numbers are “data”. (Also, the suggestion in the documentation of saving the datasets is pretty infeasible given the number and size of the simulations being run.)

I guess my confusion is about whether I need to “do” anything additionally to ensure reproducibility? Is it enough for me to just be working in a project environment and ]add'ing the various packages I need as I go? Then the automatically-generated Manifest.toml will ensure that the random numbers generated will be the same in future versions? Is that correct?

I’m not an expert in statistical simulation, but from what I understand about MC methods, the exact data shouldn’t really matter as long as the code/method to obtain your results is sound. If you can only produce correct results with a specific set of (random) data, it’s an indicator that something is wonky.

Of course, providing a given seed to allow other people to check your results bit-by-bit is fine (though no more than that).

As long as the people using your code/trying to reproduce it are using the same julia version as you and use the Manifest.toml (this would also ensure the same package versions are loaded) you’d have to provide to them, that should be the case, yes. The only thing I can imagine going wrong is if (by pure chance) your code relies heavily on floating point shenanigans on your specific machine (which it shouldn’t for MC), but at that point no effort of ensuring the same code runs would save you anyway, since they don’t have your machine.

1 Like

I’m not an expert in statistical simulation, but from what I understand about MC methods, the exact data shouldn’t really matter as long as the code/method to obtain your results is sound. If you can only produce correct results with a specific set of (random) data, it’s an indicator that something is wonky.
Of course, providing a given seed to allow other people to check your results bit-by-bit is fine (though no more than that).

It depends what you mean by “doesn’t really matter.” The whole point of reproducibility is that the results reported in the paper are exactly reproducible from the code, which requires getting the same draws. If different results are obtained, that doesn’t mean that the code/method is not sound. But it does mean the results are not reproducible.

As long as the people using your code/trying to reproduce it are using the same julia version as you and use the Manifest.toml (this would also ensure the same package versions are loaded) you’d have to provide to them, that should be the case, yes. The only thing I can imagine going wrong is if (by pure chance) your code relies heavily on floating point shenanigans on your specific machine (which it shouldn’t for MC), but at that point no effort of ensuring the same code runs would save you anyway, since they don’t have your machine.

Now I don’t follow. You say:

As long as the people using your code/trying to reproduce it are using the same julia version as you…

But the whole point here (see original post) is to ensure reproducibility across different julia versions.

Right, of course! :slight_smile:

Right, since the number stream is allowed to change between different julia versions, I had to add that caveat. The random number stream has changed in a minor version in the past, I think 1.5 → 1.6 made it thread-local (changing the stream) and 1.6 → 1.7 has a bugfix (changing the stream again). If you really truly want to ensure stability across julia versions, use StableRNGs.jl (linked above). At that point, you’ll at least get the same RNG values and together with the fixed package versions from your Manifest.toml, you’ll have reproducibility across julia versions (since the Manifest.toml ensures that even with a higher julia version, the packages remain fixed to the versions you used).

Right, since the number stream is allowed to change between different julia versions, I had to add that caveat. The random number stream has changed in a minor version in the past, I think 1.5 → 1.6 made it thread-local (changing the stream) and 1.6 → 1.7 has a bugfix (changing the stream again). If you really truly want to ensure stability across julia versions, use StableRNGs.jl (linked above).

This is also indicated as a solution in the documentation. However, my original question was about the parenthetical remark:

(You can also, of course, specify a particular Julia version and package manifest, especially if you require bit reproducibility.)

Are you saying that’s actually not possible?


Another thing I am confused on which is perhaps related, if not, maybe slightly off-topic.
My project runs fine in Julia 1.1.
However, when I try to run it in Julia 1.6 I get some errors.
That’s even though I did ]instantiate.
I’m confused—isn’t the point of instantiation that if someone is using a later version of Julia (or its packages) they can have backward compatibility with my code?

That remark is saying

  • Use the same julia version as used when originally writing the code and
  • instantiate the manifest that was created when writing the code

It does not apply to using a different julia version (if you need that, use StableRNGs.jl).

No, that should definitely work (no minor julia release should be breaking) - what kind of error do you get? Please post it in full.

In theory, some package could have set an upper bound on the julia version, though that shouldn’t be the case (unless it’s upperbounding to the next major version, i.e. all 1.x.y are ok but 2.x are not).

It does not apply to using a different julia version (if you need that, use StableRNGs.jl).

Got it, thanks for clarifying.
My confusion in that sentence was related to this part of the Pkg.jl documentation, which says that you can also specify a julia version in Package.toml:

Compatibility for a dependency is entered in the Project.toml file as for example:

[compat]
julia = "1.0"
Example = "0.4.3"

I had thought the statement about reproducibility of random numbers meant that if I included a julia version as above, then I would get the same sequence of random numbers. If that’s not actually true, then what is the intended use-case for this?

[compat]
julia = "1.0"



No, that should definitely work (no minor julia release should be breaking) - what kind of error do you get? Please post it in full.
In theory, some package could have set an upper bound on the julia version, though that shouldn’t be the case (unless it’s upperbounding to the next major version, i.e. all 1.x.y are ok but 2.x are not).

My project has a lot of code, so I tried to whittle it down to a minimal-ish working example.
It has something to do with the Manifest.toml (which is too large to post). Leaving the Project.toml was sufficient and necessary to reproduce the error. So perhaps I am doing something wrong in how I am using Project.toml?

Files:

# ./TestingVersions2/Project.toml 
name = "TestingVersions"
uuid = "494561fa-ca9a-11ea-03a1-2d0453b7d2f0"
authors = ["torgo"]
version = "1.0.0"

[deps]
BenchmarkTools = "6e4b80f9-dd63-53aa-95a3-0cdb28fa8baf"
CSV = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
CategoricalArrays = "324d7699-5711-5eae-9e2f-1d82baa6b597"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
Dates = "ade2ca70-3891-5945-98fb-dc099432e06a"
DelimitedFiles = "8bb1440f-4735-579b-a4ab-409b98df4dab"
Distributed = "8ba89e20-285c-5b6f-9357-94700520ee1b"
Distributions = "31c24e10-a181-5473-b8eb-7969acd0382f"
GLM = "38e38edf-8417-5370-95a0-9cbb8c7f171a"
Gurobi = "2e9cd046-0924-5485-92f1-d5272153d98b"
JuMP = "4076af6c-e467-56ae-b986-b466b2749572"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
Logging = "56ddb016-857b-54e1-b83d-db4d58db5568"
LoggingExtras = "e6f89c97-d47a-5376-807f-9c37f3926c36"
MathOptFormat = "f4570300-c277-12e8-125c-4912f86ce65d"
MathOptInterface = "b8f27783-ece8-5eb3-8dc8-9495eed66fee"
Parameters = "d96e819e-fc66-5662-9728-84c9c7592b0a"
PrettyTables = "08abe8d2-0d0c-5749-adfa-8a2ac140af0d"
Printf = "de0858da-6303-5e67-8744-51eddeeeb8d7"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"
Roots = "f2b01f46-fcfa-551c-844a-d8ac1e96c665"
Sobol = "ed01d8cd-4d21-5b2a-85b4-cc3bdc58bad4"
Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
StatsBase = "2913bbd2-ae8a-5f71-8c99-4fb6c76f3a91"

[compat]
CSV = "= 0.5.26"
CategoricalArrays = "= 0.7.7"
DataFrames = "= 0.20.2"
Distributions = "= 0.23.1"
GLM = "= 1.3.8"
Gurobi = "= 0.7.4"
JuMP = "= 0.20.1"
MathOptFormat = "= 0.4.0"
MathOptInterface = "= 0.9.7"
Parameters = "= 0.12.0"
Roots = "= 0.8.4"
Sobol = "= 1.3.0"
StatsBase = "= 0.32.2"
julia = "= 1.1"
# ./TestingVersions2/src/TestingVersions.jl
module TestingVersions
    using DataFrames

    function testing_df_assignment()
        df = DataFrame(a = rand(2))
        df[:, :b] .= 2
    end
    export testing_df_assignment
end

First I run this in Julia 1.1.0:

using Pkg
Pkg.activate(".")
Pkg.instantiate()
using TestingVersions
testing_df_assignment()

I get the expected output:

2-element Array{Int64,1}:
 2
 2

and a Manifest.toml is generated.

Then I go to Julia 1.6.0.
The same sequence of commands produces:

ERROR: MethodError: no method matching ndims(::Type{DataFrames.LazyNewColDataFrame{Symbol}})
Closest candidates are:
  ndims(::DataFrames.DataFrameRow) at /home/at/.julia/packages/DataFrames/S3ZFo/src/dataframerow/dataframerow.jl:166
  ndims(::Base.Iterators.ProductIterator) at iterators.jl:967
  ndims(::AbstractChar) at char.jl:191

Any idea what I am missing?

Ok, I made the Project.toml file simpler, while still being able to produce the same error.

name = "TestingVersions"
uuid = "494561fa-ca9a-11ea-03a1-2d0453b7d2f0"
authors = ["torgo"]
version = "1.0.0"

[deps]
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"

[compat]
DataFrames = "= 0.20.2"

To be clear, I understand that there may be something in Julia 1.6 that is incompatible with the DataFrames version I am using.
But I thought that the point of the project environment was to ensure that this is not a problem for being able to reproduce someone’s code.
Am I using Project.toml incorrectly?

As a matter of fact, that’s not possible. As a matter of fact, in general a manifest can’t be instantiated as-is by a different (minor) version of Julia used to create, because as a matter of fact different versions of packages have different requirements on the Julia version. More specifically, in general don’t expect to be able to instantiate with Julia v1.X a manifest generated with v1.Y, with X < Y (you usually have more luck with instantiating a manifest generated with an older version of Julia).

This is a general statement, of course if you’re lucky you can create a manifest which is usable across multiple versions, but that’s likely to be almost empty.

From Julia v1.7, manifests will clearly specify the version of Julia they were created with, because that’s the only version you can expect to use to reinstantiate it with.

3 Likes

More specifically, in general don’t expect to be able to instantiate with Julia v1.X a manifest generated with v1.Y, with X < Y (you usually have more luck with instantiating a manifest generated with an older version of Julia).

If that’s the case, then my problem is solved. I just would have never tried to go down this path to begin with.

But two things remain unresolved for me:

  1. @Sukera said the opposite above:

    No, that should definitely work (no minor julia release should be breaking) - what kind of error do you get? Please post it in full.

  2. For what reason would one add something like

    [compat]
    julia = "1.0"
    

to the Project.toml?

How’s that the opposite of what I said? I said that you shouldn’t expect to be able to instantiate with an older version of Julia a manifest created with a newer version (it may happen that you can, but it won’t happen in general), but the other way around (new version of Julia instantiating a manifest created with an older one) should be safer.

1 Like

I can reproduce the error. So I think this might genuinely be a bug.

But if it’s a bug on DataFrames.jl’s end, it won’t be fixed because that is a very old version of DataFrames and we have reached 1.0 now, so upgrading data frames (while fixing deprecated functionality) would be the way to go.

If it’s a bug on Julia’s end, which I would think is unlikely, you should file an issue.

But diagnosing the source of the error might take some work.

The compat section is for specifying which kinds of julia or package versions are compatible with the code you wrote. E.g. if you put julia = "1" in there, you’re claiming compatibility with all versions of the form 1.x.y. There’s a section in the Pkg docs that goes into more detail about these bounds. It doesn’t have any bearing on what julia version you’re using to actually run the code (other than requiring at least the minimum version specified). There’s no automatic selection of the code from a specific version.

After instantiating the project in 1.6, can you post the output of ]status?

Not via package manifest but how about

  1. copying current julia/stdlib/Random at master · JuliaLang/julia · GitHub
  2. create MyRandom package with it
  3. use MyRandom in your projects across Julia versions

It would be a fairly small package and it would also make it easier to share your work and research with others -otherwise you will have to tell people how to set up their own package manifest to achieve your results.

It’s also worth noting that floating point functions aren’t guaranteed to produce the same results between Julia versions. Specifically, all of the exp family, log2,log10, most of the hyperbolic trig (so also any complex trig), and some of the regular trig will produce slightly different answers in 1.7 than 1.1.

Just using StableRNGs.jl is already easier and less error prone than copying a stdlib and having to serve it to people as well. You’re not getting around providing a Manifest.toml because of all the other packages the code depends on, so using StableRNGs doesn’t incur additional overhead.

1 Like

Easier yeah but…

StableRNG is currently an alias for LehmerRNG , and implements a well understood linear congruential generator (LCG); an LCG is not state of the art…

I would not recommend this for research, in fact, I would suggest JuliaLang not to recommend this path either in their docs but rather provide separate builds for Random (e.g. Random17). Perhaps I would also recommend not to recommend writing entire random sequences in a file to reproduce results -some are really gigantic and there is no point anyway if we have the RNG.

Not sure I understand, if we build Random17 and share it, those who want to use the same random sequences to reproduce results the only thing they need to do is to load and use Random17, am I missing something?