Saving current julia/package information for reproducible research

Hi all does anyone have a “best practice” for archiving the current julia package state for the purposes of reproducible research.

What I have in mind is being able to save off a file with the required packages and associated versions that were used to run a certain script. I know about Pkg.status() but this would return all installed packages, I am looking for something that saves off the version of only the currently loaded packages.

Thanks!

1 Like

The next generation package manager will address this, but in the mean time Playground or DeclarativePackages may help.

3 Likes

Thanks for the response. I am glad that Pkg3 is going to address the issue. In the meantime your recommendations are useful.

How easy is it to bundle up an existing julia configuration (i.e. julia v0.5 plus all the packages I currently have installed) into a docker image for that purpose?

Should be pretty easy. You just need to extract from .julia/v0.5/REQUIRE the list of (“top-level”) packages you have installed, and add them in the Dockerfile, as you can see e.g. here

1 Like

I just wanted to ask whether DeclarativePackages (and fixing RNG-state, and running single-threaded) is supposedly enough for “mostly reproducible” computations (mostly meaning: Unless something very weird happens, I should get identical results, including floating point rounding).

I’m asking because DeclarativePackages does not fix the compiler version used for julia, or libc version, etc. Also, Ithink that the commit-hash for julia does not fix all the commit-hashes for dependencies (llvm, lapack, etc)

Or do you recommend just fixing the hash of a VM image?

If you’re willing to use VMs and you really want to make sure something is perfectly reproducible, you may want to check out ReproZip. It uses system call tracing to record and bundle up everything you need to run an experiment, including source code, data files, shared libraries – everything. It doesn’t care what language you’re using.

5 Likes

My solution has been to use the official julia docker repo (Docker Hub) with the Singularity container system which is geared towards reproducible science (http://singularity.lbl.gov/).