Tests failing on Linux in Travis CI

Not sure if this is the right category. I’m experiencing sporadic test failure on Travis CI. It only occurs with Linux and not MacOS, and only a particular subtest. My tests are using the ≈ operator to compare floats. Some of the float values are not simple and so required me to find enough decimal places to match.

Is there any reason why this operation, or floating point math in general, would produce sporadically different results on Linux for the exact same operations?

Floating point arithmetic should be exactly deterministic given the same input, on the same CPU with the same libraries. If it’s failing only on the (Travis,linux) combination, there’s a decent chance that the same CPU and math libraries are being used (though getting a fresh VM or container for each CI job, it’s not a foregone conclusion).

Most likely the input to your floating point comparison really is changing (perhaps just a little) each time. For example this could be due to using random sampling in some part of your computation and not reseeding the global PRNG to something fixed at the start of the tests (though @testset does this for you). There’s other things which could cause the problem though ranging all the way from bugs in your library to bugs in the julia runtime or underlying libraries.

If you can link to a failing vs passing Travis job for your library people might be able to offer more specific advice.

1 Like

Thanks. I’ll see if I can provide more later, but I’m not doing anything random. All the inputs are deterministic, manipulated only by standard arithmetic.

I was able to produce output for the runs that had an error. In the cases where the test passed, a variable was computed to be 0.0. In the cases where the tests failed, the same variable was -1.81899e-12. Always the same wrong values.

This value appears to be special, as it is exactly 8192 * eps(Float64)

1 Like

Hmm. 8192 is exactly eps(Float16)/eps(Float32). I wonder if that’s related.

2 Likes

I’ve narrowed down where the sporadic error is occurring:

a = cat(hcat([16.5, 18.4, 12.7, 14.0, 12.8], [39.1, 26.2, 21.3, 35.8, 40.2]),
        hcat([14.5, 11.0, 10.8, 14.3, 10.0], [32.0, 23.8, 28.8, 25.0, 29.3]), dims = 3)
result = sum(c -> sum(c .^ 2), a)

This operation randomly fluctuates between 11354.310000000001 and 11354.31, where the former is the “correct” value that meshes with all the other values being calculated.

Another observation: If I run the operation multiple times in the same test run, the same value is always returned, either the former or latter, never both.

Just fyi, the reason I’m using this particular code and not just sum(a .^ 2) is so that the same code works on both an array of vectors of numbers as well as array of numbers. Obviously, I could treat them differently to avoid the issue, but it probably shouldn’t matter?

Weird. When you say “randomly” is this the exact same build of julia on the same architecture each time? Have you managed to reproduce it outside CI?

Also - which architecture does it occur on, and which julia version?

Oh! Good insight!

The platforms it is failing on are all linux/haswell. It is working on linux/sandybridge, linux/ivybridge and macos/ivybridge.

Julia version doesn’t matter (0.7/1.0/nightly), could be any.

If I have the input in vectors, however, it always produces the passing result regardless of architecture.

I’m diving through Base to figure out what is different between them…

Since it’s floating point, the order of operations matters:


julia-1.1> sum(c -> sum(c .^ 2), reshape(a[randperm(length(a))],size(a)))
11354.31

julia-1.1> sum(c -> sum(c .^ 2), reshape(a[randperm(length(a))],size(a)))
11354.310000000001

and that order will depend on architectural details like SIMD width.

1 Like

Ah, sure. I found an Intel slide deck mentioning compiler options to allow/prevent floating point optimizations that can affect how stable the value is. Oy, I’ll just have to deal.

Thanks to you both.

Since I’m thinking about it–doesn’t matter for my purposes here, but does Julia make such options available? See: https://www.nccs.nasa.gov/images/FloatingPoint_consistency.pdf

For normal floating point expressions, I thought we obeyed IEEE floating point semantics by default. Except that we do not specify the order in reductions like sum, because specifying a left-to-right ordering (for example) is neither accurate nor fast. I expect that if you used foldl rather than sum you’d get the same results on all CPUs, because this guarantees left associativity.

For controlling floating point optimizations, there’s the @fastmath annotation and the --math-mode={ieee,fast} compiler option. (Personally I’d stay away from setting the fast mode globally except for testing how much performance you might get, and prefer to annotate inner loops with @fastmath if necessary and where the algorithm is insensitive to it). The documentation for math modes is at https://docs.julialang.org/en/v1/manual/performance-tips/index.html#man-performance-annotations-1. See also the docstring for @simd which I think we use in various places in the Base reduction code.

2 Likes

If you’re interested in learning more, I would suggest having a look at Nick Higham’s book Accuracy and Stability of Numerical Algorithms (section 4.2 talks about summation).

If you want a hard (and admittedly fairly conservative) error bound, then you can use:

julia> eps(maximum(a.^2)*length(a))/2*(length(a)-1)
3.456079866737127e-11

The rough explanation is that there are length(a)-1 intermediate addition operations, each of which gives a result less than v = maximum(a.^2)*length(a) in magnitude, and so the rounding error of each addition is at most eps(v)/2.

4 Likes

While we’re on the subject, it occurred to me that the fact that Haswell has extra FMA capabilities compared to the other architectures mentioned by @BioTurboNick could cause differences in the number of roundings, leading to different results even if associations are preserved. Some experimentation suggests that Julia generally doesn’t use FMA unless specifically requested, but I didn’t have the patience to go down the rabbit hole of methods in cascaded sums to see if that applies here.

Is this correct @simonbyrne? that a.^2 looks suspicious to me for a sum. Equation 4.4 would suggest that the pessimistic bound (if I’m reading their notation correctly) is

julia> (length(a)-1)*eps(sum(abs.(a)))
1.0800249583553523e-12

Great link by the way. Side note - this made me curious about the rtol=sqrt(eps(T)) used by isapprox which I’ve always thought is a somewhat surprisingly permissive (though very practical) default. It turns out this was decided on over at https://github.com/julialang/julia/issues/12375 though I didn’t get a lot of insight from reading that, other than “it’s convenient” :slight_smile:

The rabbit hole leads to the block-sequential part of the generic reduction code which I think we use for sum, (though not for min and max). So I guess FMA vs not-FMA is a question about what @simd allows, and it appears to allow many reorganizations.

By the way, I just tried out GitHub - JuliaDebug/Cthulhu.jl: The slow descent into madness for exploring this rabbit hole with @descend sum(a). It’s perhaps arguably not quite the right tool for the job, but it worked in this case and is super cool. Thank you @vchuravy!

3 Likes

You can always investigate this in depth, but I think that the bottom line is that you have to pick the tolerances for your particular use case. Eg if you are just verifying that the algorithm is “correct” in the sense of catching gross mistakes, you can decide to live with a certain amount of error as a feature of floating point, and set eg atol = 10*eps() rtol = sqrt(eps()). On the other hand, if you are coding some special functions and you are striving for some particular approximation error measured in ulps, you have to find a way to control these things.

1 Like

That was simply because the original code above involved an elementwise squaring. You are correct that (4.4) does give a theoretically better bound, but in this case:

julia> (length(a)-1)*eps(sum(abs.(a.^2)))
3.456079866737127e-11
1 Like

Oh right, of course :slight_smile: