Bug? shuffle() breaks normalisation test

Hi all,

I think I found a Julia bug (Version 1.1.0) I stumbled upon when writing a function for checking whether a tensor is normalized. I wrote tests for it and it seems (somewhat, sometimes) reproducible.

It comes down to this:
I generate some ‘random’ numbers that should be normalized, and they are, up to the point that the shuffle function is called, then they no longer seem normalized according to (some) of my tests.

My code and tests can be found here:
https://gist.github.com/dietercastel/541f3228b18ccf1820c916a83ebc5aaf

If this is due to numerical errors (most likely I think) how can I get around it? What’s a better, more Julia friendly, approach to write these tests or code?

I tried the same code in 3 different ways (Orig file, REPL, minimal file linked): 2/3 made the fourth test fail. And I’ve seen the second test fail as well (ocasionally).

I haven’t tried with a seeded random number or a fixed tensor, but no time atm for that. I’ll come back to it later though! I’ll also give it a try in v1.2 soon.

Looking forward to your responses.

you just need this line for you isNorm instead:

isStrictPos(tensor) & (sum(tensor) ≈ 1)

doc:
https://docs.julialang.org/en/v1/base/math/#Base.isapprox

2 Likes

Well that works indeed. Thanks!

Is that something that has to be done in general in Julia? Replace all == with ≈ ?
Are there some guidelines/reading material about when too and when not to use it. It feels a bit weird but of course at least it’s explicitly possible in Julia. Do you do it only in tests or always throughout your code?

Is there also an infix notation for isapprox() ?

my line up there uses infix already, no?

well, my understanding is that usually, you use == (or even === when possible for better performance etc). But if you’re testing using/against real-world data, or you have some fitted/normalized data, or you’re aggregating many floats and expect to get some answer (usually a mathematical one, for example, normalization, or zero), you use approx. Although usually at this step you’re returning the result, so I say usage in a test is also common?

I am not sure that is a good idea, those functions serve a different purpose. And of course is not transitive.

The underlying issue is floating point, so maybe this is helpful:

It’s important to understand what does: by default it checks to see if two floating point values are equal for the first half of their significant digits. That’s pretty lenient but also quite standard. To be used with caution but also essential when checking results that can depend on numerical round off. I’m afraid the only answer here is to have some grasp of numerical analysis and to use judgement. Replacing all equality checks with is definitely not a good idea. Nor would it even be sufficient: 0.0 is never approximately equal to any non-zero value since it has no scale so you can’t know which bits should be considered significant or not. As the docs for say:

In particular, summation, even though it seems like an innocuous operation is sensitive to data ordering. In fact, you can sum the same set of numbers in different orders and get basically any possible result:

We don’t use naive summation by default in Julia, so things aren’t quite that bad, and if you’re only adding up positive values it’s not possible to have such a pathological situation, but keep this in mind.

3 Likes

Except for generators…

julia> sum(x for x in ones(Float32, 100_000_000))
1.6777216f7

True.

Ah yes, but I meant another infix notation. That wasn’t clear indeed, scusi! The thing is, I don’t like how subtle the difference is visually. (== and the expanded \approx)

I’ve been looking around a bit and custom infix notations are not around I guess, or did I overlook? Was trying to ‘fix’ it with some macro but they seem to break with UTF8 chars (I’ll report more in depth later).

Anyway, thanks for the kind help all! :slight_smile: