How to maximize replicability of a package

Joris_Pinkse · May 18, 2023, 3:16am

I’m developing a new package for a broad audience with prospective users valuing replicability/reproducibility of results.

On my laptop and desktop which have different versions of Ubuntu, different hardware (AMD versus Intel), different versions of Julia, different versions of included packages, etcetera, the results are identical.

But the github tests (nightly and 1.8) produce numbers that are identical to each other and fairly close, but not identical, to mine. Those machines also run Ubuntu, one runs 1.8, and I haven’t checked all package versions.

How can I maximize the likelihood that results can be replicated by others? (within reason)

Thanks!

gdalle · May 18, 2023, 6:54am

Here are the main things that come to my mind:

commit the Manifest.toml along with the Project.toml
ensure reproducibility of random numbers, eg. with GitHub - JuliaRandom/StableRNGs.jl: A Julia RNG with stable streams

Joris_Pinkse · May 18, 2023, 12:47pm

Thanks @gdalle. I’ll try StableRNGs (currently using Random123 on one thread with a specified seed if replicability is turned on). As for committing the manifest, I want other people to have results that are replicable, but not to constrain them.

PetrKryslUCSD · May 18, 2023, 1:24pm

Also check that the results do not depend on parallel execution (threads?).
For instance parallel loops may produce results with minor differences that would
depend on the number of threads used.

Joris_Pinkse · May 18, 2023, 1:29pm

Thanks @PetrKryslUCSD .

it’s not the rng
it’s not the number of Julia threads, even though I run nested dynamic threaded for loops
the number of BLAS threads appears to matter, but that’s not the only issue

ranocha · May 18, 2023, 3:02pm

It can depend on your workload whether perfect reproducibility can be achieved reasonably well. Even if you have something as simple as matrix vector products of floating point numbers, the exact results of optimized implementations will depend on the CPU architecture you use, e.g., the type of SIMD instructions available. Thus, if you want to achive perfect reproducibility, you need to restrict everything to the least common denominator - do not use BLAS but plain loops without SIMD instructions, do not use any of the great performace packages such as LoopVectorization.jl etc. I am not sure if this is really what you want. For me, that wouldn’t be an option. That’s why I am happy with reproducibility up to floating point differences.
However, the setting can of course be different if you are only concerned about integer arithmetic.

Joris_Pinkse · May 18, 2023, 3:27pm

Thanks @ranocha. It’s not what I want, but what journals require. Several journals in my discipline now require replicability. LoopVectorization, Tullio, etc, are wonderful tools but not for that purpose. For one thing, they turn on fastmath.

stevengj · May 18, 2023, 3:36pm

I’m curious to see the precise wording of these policies.

Cross-platform bit-for-bit replicability is extraordinarily hard to achieve in modern scientific code because it would disallow the use of practically all high-performance libraries. Even a library function as simple as sum (in Julia, Numpy, and probably many other languages/libraries) will give slightly different results depending on whether the CPU has SIMD instructions.

Joris_Pinkse · May 18, 2023, 3:55pm

Thanks @stevengj !

I don’t typically have to deal with such issues myself, but users of the package would. Here’s an example of the workflow at one prominent organization that runs several journals: Step by step guidance - Office of the AEA Data Editor They have a data editor who checks whether s/he can reproduce the results that you are reporting.

stevengj · May 18, 2023, 4:03pm

I looked through these pages, and it doesn’t seem like it gives a precise definition of “replicating” results. However, it says to “identify all outputs as per the README and the list of tables/figures/in-text numbers” and then “compare the output to the tables/figures/numbers in the manuscript”. Note that tables and numbers in the manuscript are very unlikely to be given to more than a few significant digits, and comparing to “figures” is even more imprecise.

So my conclusion from this policy is that they don’t require bit reproducibility, only reproducibility to the level of precision claimed in the manuscript.

Joris_Pinkse · May 18, 2023, 4:08pm

True, that’s what I want to ensure.

stevengj · May 18, 2023, 4:35pm

One wants to make sure that a manuscript only reports accurate digits, i.e. digits that are insensitive to roundoff and other errors. But this is not a matter of bit reproducibility at all — it’s a question of numerical analysis.

(Even if floating-point roundoff errors — and other errors, such as truncation/discretization errors, etcetera — are exactly bit-reproducible, that doesn’t mean that all of the digits are actually significant/converged with respect to the underlying problem being solved!)

For the package author, helping users to do this would mean documenting methods/suggestions to estimate error bars on your return values. (e.g. you may directly return some error estimate, similar to packages like QuadGK, or you may suggest a procedure like repeating the calculation at different precisions or with different tolerance parameters or different random samples.)

Norman · May 18, 2023, 6:04pm

It’s totally normal to see slightly different results on different hardware. The AEA policies are definitely NOT asking for digit-by-digit replicability. If in the paper, the author reports the effect of increasing X on Y is 0.512, for example, then nobody will consider there is a replicability issue if someone gets 0.51201 on their computer. No economist would even bother to ask for these extra digits.

Also, if you try Stata, which is widely used among economists, you will see that the numbers are not completely identical either. This is not something special to Julia.

GunnarFarneback · May 18, 2023, 6:28pm

Having a stored Manifest is fundamental to getting well defined versions of all dependencies and really something you want for reproducibility. And I don’t see how it’s constraining anyone. If someone thinks that the reproducibility isn’t all that important, all they have to do is Pkg.update and it’s like you didn’t provide that Manifest.

Joris_Pinkse · May 18, 2023, 9:10pm

Of course. But are you implying that small numerical inconsistencies cannot accumulate to something substantial? That whenever e.g. fastmath and ieee math produce large numerical differences the end result produced by ieee math is unreliable anyway? I disagree.

Btw: I never mentioned bit-for-bit replicability; didn’t intend that.

Regardless, I know enough now. Thanks all.

stevengj · May 18, 2023, 9:16pm

Of course they can — the accumulation/growth/propagation of error is a major topic of study in numerical analysis. The “error bars” on the results of a numerical calculation are normally much larger than machine precision.

But it has almost nothing to do with whether different hardware etc. produces the same result — just because digits are reproducible doesn’t mean they are accurate! That’s why I think that most of this discussion has centered on a red herring.

Joris_Pinkse · May 18, 2023, 9:32pm

Thanks @stevengj. Sounds like I’ve done the max I can reasonably do in this regard already.

Topic		Replies	Views
How to generate reproducible random numbers across versions via package manifest? General Usage random , reproducibility	27	2159	August 5, 2021
Julia package versioning is a real nightmare to me. What am I doing wrong? General Usage package , versions	17	844	January 21, 2025
Experimental reproducibility: julia vs the rest General Usage reproducibility	13	1622	September 23, 2020
How to use Manifests for a clean environment for reproducibility General Usage reproducibility	6	1577	October 15, 2021
Best practice for academic code reproducibility (Manifest.toml between versions of julia) Package Management reproducibility	6	1163	March 23, 2022

How to maximize replicability of a package

Related topics