(This is also true with fixed-point arithmetic, and in fact roundoff error growth is typically even worse with fixed point. There is no panacea when you are dealing with a finite subset of the real numbers. The only thing that is improved here is that you are getting rid of floating point’s signed zero.)
True. There is no getting around the limits of finite bit representation. IMO the FixedPoints give less of a false sense of security and are easier to reason about. Perhaps even contrary to intuition, the big numerical errors which appear in FixedPoints draw programmer attention to problematic points in calculation instead of slower loss of precision in point-floaters (hence the use of creeping). But both representation have their respectable place.
I feel like most of the supposed “easier reasoning” of fixed point is simply because it is decimal, so it doesn’t round human inputs. If you use decimal floating point, you get this advantage too, along with the nicer roundoff characteristics (due to increased dynamic range and fixed relative precision, rather than fixed absolute precision) of floating point.
@Tamas_Papp, thanks for the link and the thorough, if not extended, discussion on why ==
is different from isequal
. I think this best falls into the category of unusual, but intended behavior.
I guess that leaves how to get around this behavior (if desired) with the least kludgey code possible within reason. Right now, my simple quick fix looks like:
for root_idx in 1:length(p)
if imag(p[root_idx]) == 0.0
p[root_idx] = real(p[root_idx]) + 0im
end
end
In writing this, I realized it could be simplified as:
neg_zero_idx = isequal.(imag(p),-0.0)
p[neg_zero_idx]=real(p[neg_zero_idx])+0im
Can anything be done further? Or am I overthinking it at this point?
I am probably missing some context here for your code so I can’t comment on your solution. In general, you can just map members of an equivalence class to a well-defined element and work with that. Eg
eqmap(x) = x
eqmap(x::AbstractFloat) = iszero(x) && signbit(x) ? -x : x
Then form sets & dicts using the above.
I should perhaps make a few points here. First of all, this has nothing to do with floating-point precision or rounding. It has everything to do with signed zeros being special and weird in the IEEE floating-point standards.
There are three standard equality relations in Julia:
-
===
– object identity / programmatic distinguishability (aka “egal”); -
==
/<
– “intuitive” / IEEE equality and comparison; -
isequal
/isless
– hashing & sorting.
The ===
relation is not user-definable and is guaranteed to be a true equivalence relation.
The ==
is not an equivalence relation because IEEE dictates this whacky nonsense with NaN != NaN
. It also has -0.0 == 0.0
and !(-0.0 < 0.0)
(thank goodness, if these disagreed it would be a true disaster). The ==
and <
functions are user definable, so people do whatever they want, but ==
should generally be an equivalence relation other than NaN
-like values, and it should be compatible with <
, which should be a partial pre-order.
The isequal
relation is not generally meant to be used directly, but it is used for hashing and sorting. The isequal
predicate is an equivalence relation (transitive, reflexive), and it must be compatible with isless
. In addition to the usual non-reflexive ==
relation, IEEE also defines a total ordering which is actually an equivalence relation and orders -0.0 strictly before 0.0 (i.e. not equal). For floating-point values, the isless
and isequal
relations generally follow this IEEE total ordering. Here’s where it gets “interesting”:
-
Since we use
isless
for sorting and IEEE dictates that -0.0 sorts before 0.0 we must haveisless(-0.0, 0.0)
, otherwise -0.0 and 0.0 would end up randomly scrambled in the middle of sorted vectors. -
Because
isequal
is compatible withisless
– i.e. for allx
andy
of the same type,isless(x, y)
orisequal(x, y)
orisless(y, x)
but not more than one – we must therefore have!isequal(-0.0, 0.0)
.
Thus the IEEE standard seems to require that -0.0 and 0.0 hash differently. There are some obvious questions…
Q: Why not just use different relations for hashing and sorting?
A: Hashing and sorting are often used as implementation details behind similar data structures. For example, a Dict
uses a hash table – i.e. hash
and isless
– but a SortedDict
probably uses a tree which uses the isless
comparison predicate. If we used different relations for these data structures, then switching from a Dict
to a SortedDict
would not only change the key ordering (the obvious change), but it would also subtly change the bucketing behavior. Say, for example, we sorted -0.0 before 0.0 but considered them equal for hashing. Then they would be in the same bucket in a Dict
but in different buckets in a SortedDict
. That sort of subtle behavior change would far worse than the current consistent separation of -0.0 and 0.0.
Q: Why not use something besides hashing for sets? Isn’t this allowing the implementation detail that Set
is implemented using Dict
leak through?
A: We could use a different structure, but however we do it, equivalence has to be decided by a reflexive and transitive relation. Of the three equality-like relations in the language, only ===
and isequal
are true equivalence relations. ==
is not because of NaN
(at least). The ObjectIdDict
type uses ===
but this is generally not what you want since, e.g. 1 !== 1.0
, [1,2] !== [1,2]
. ===
is not a very user-friendly relation, despite its fundamental correctness. That leaves isequal
. The fact that a Dict
is a fast, efficient way to bucket things with respect to the isequal
relation makes Dict
a sensible choice for implementing sets. But we could just as well use a vector and just linearly scan through to see if any value in the vector is already isequal
to a new element – that would be slow but it would work similarly with respect to -0.0 and 0.0. So using isequal
for bucketing isn’t really an implementation detail of Dict
at all – it’s the only viable choice.
Q: Why not just ignore IEEE and have isequal(-0.0, 0.0)
and !isless(-0.0, 0.0)
? Let -0.0 and 0.0 be scrambled in the middle of sorted arrays – does anyone really care? Or hack in a pass at the beginning of sorting type floating-point arrays that separates values by sign first so they’re generally ordered correctly even though isless
doesn’t dictate that they must be.
A: We’ve very seriously considered this. IEEE was very clearly not designed with the consequences for programming language semantics in mind. The fact that the standard equality relations is not reflexive is pretty terrible. The fact that -0.0 and 0.0 are equal – but not really! – is, in some ways, even worse (as you’re seeing here). So we could just say that -0.0 and 0.0 sort and hash the same in Julia, to hell with IEEE.
That would leave sorting in a bit of a strange situation, however. We would like for sorting a Vector{Float64}
and sorting a Vector{Any}
that happens to only contain values that are either Float64
or isequal
to them to sort in indistinguishable orders. However, the former uses quicksort for speed – since you don’t need a stable sorting algorithm for floating-point values. Since NaN
s are all isequal
, we want them to end up in the same order they were originally in (to have the same effect as a stable sort). To accomplish this, we stably move the NaN
values to the end of the Vector{Float64}
as a first pass before sorting the rest of the numbers. It’s unclear what one would do about signed zeros if they didn’t compare differently. If we did nothing then the signed zeros in the middle of the sorted vector would be the only thing leaking the details of the unstable sorting algorithm at the end. We could try to do something like we do for NaNs, but it’s unclear how to do that efficiently – unlike NaNs, you don’t know in advance where in an array the zeros are supposed to end up. We could also sort signed zeros by their sign as a post-processing pass, but then we give up the equivalence of sorting a Vector{Float64}
and a Vector{Any}
with the same values in it. Perhaps that doesn’t matter, however.
This is a somewhat unlikely end point for the seemingly innocuous questions “Why do sets consider -0.0 and 0.0 to be different? Shouldn’t they be considered the same?” Despite concerns about deviating from IEEE and the sorting of signed zeros, we could very possibly make isequal(-0.0, 0.0)
. But this is not a small decision and it’s unclear that such a change wouldn’t lead to just as many problems and confusions – just different ones.
See https://github.com/JuliaLang/julia/issues/18485 where this option was decided against in a previous release cycle. That doesn’t mean this decision is final, but I’m not sure it’s a can of worms we want to reopen.
To put it much more succinctly, there are three options:
-
isless(-0.0, 0.0)
⟹!isequal(-0.0, 0.0)
⟹length(unique([-0.0, 0.0])) == 2
-
isequal(-0.0, 0.0)
⟹!isless(-0.0, 0.0)
∧length(unique([-0.0, 0.0])) == 1
a.sort!([0.0, -0.0])
is[0.0, -0.0]
(don’t know how to implement this efficiently)
b.sort!([0.0, -0.0])
is[-0.0, 0.0]
butsort!(Any[0.0, -0.0])
is[0.0, -0.0]
Our current choice is 1. The other choices only seem to move the confusion around, not reduce it.
How about introducing new floating point types, which behave like Float64
et al most of the time, but which hash and sort -0.0
differently? Then the user could select the kind of confusion that he prefers.
(I’m not saying the Julia devs should do this. But maybe someone who needs it anyway could put it in a package…)
Could be an interesting experiment. My major concern would be promotion (eg the result type of +(::Float64, ::DamnIEEEFloat64)
). Propagation in either direction could lead to very subtle bugs.
Of course, a simple solution is this:
julia> x = [1-0im; 1+0im; 1+0im; 1+0im];
julia> unique(y->y+zero(y), x)
1-element Array{Complex{Int64},1}:
1+0im
That is a simpler solution, I might just point out that ints don’t have the issue where 0 and -0 are two (bitwise) different numbers. For instance:
julia> -0
0
julia> -0.0
-0.0
And, if we’re looking for absolute minimum characters, I think your example can be further shortened to:
unique(x+zero.(x))
This also makes it so that the sign of the zero imaginary part is predictably positive, instead of whatever happens to be in the first index position. Not a major difference for me, but something that might simplify code for some other usage. I still need to work in not coding in a Matlab accent
Not sure what you mean. x+zero(x)
gives predictably positive zeros. And for integers, the compiler will simplify to the identity function.
In this case, your Matlab accent works in your favor. The expression unique(x.+zero.(x))
should be slower than unique(x+zero.(x))
because it must create a new array before calling unique
. However, the unique
method that it calls is better optimized, so your code is faster than mine.
(unique(f::Callable, C)
uses a Set{ANY}, which is really slow, but still seems to assume that f returns eltype(C)
, so unique(Float64, [1 2 3])
gives a surprising result.)
Sure, for integers this isn’t a problem, but for floats it could help. For instance, taking your original x
vector and converting it to float, the same command produces:
julia> x = [1.0-0.0im; 1.0+0.0im; 1.0+0.0im; 1.0+0.0im];
julia> unique(y->y+zero(y), x)
1-element Array{Complex{Float64},1}:
1.0-0.0im
I apologize if this is explaining an effect you’re familiar with, but unique(y->y+zero(y), x)
does create an array of identical values, but since these are only the values unique
uses to find which indices represent the unique values, the first member of the original vector is what gets returned. So, for instance, if we reverse the order of x
, we instead get:
julia> unique(y->y+zero(y), x[end:-1:1])
1-element Array{Complex{Float64},1}:
1.0+0.0im
This is inconsequential for anything that involves a ==
comparison, but anything that involves a ===
or is
comparison could get tripped up. Conversely, the code unique(x+zero.(x))
is identical regardless of the order of the elements of x
because unique
is operating on an adjusted value of x
directly instead of using an adjusted value of x
for finding the unique indices of the original x
.
julia> unique(x+zero.(x)).===unique(x[end:-1:1]+zero.(x[end:-1:1]))
1-element BitArray{1}:
true
I apologize if that explanation was a bit obtuse, but I hope it helped explain what I meant by “predictably positive zeros”.
EDIT: Just to show that unique(x+zero.(x))
works as promised:
julia> unique(x+zero.(x))
1-element Array{Complex{Float64},1}:
1.0+0.0im
I see a lot of explanations of why isequal(-0.0, 0.0) is false, but not why isequal(-0,0) is true. As a causaul user it seems like a gotcha that these would differ. I’m thinking about data files being interpreted as floats or ints and getting different results depending on how the data are formated in the file. Because of these real-world scenarios, I would want floats and integers to behave the same in these tests.
Not that Julia needs or wants to behave like MatLab, but it uses isequal(0.0, -0.0) == true and I haven’t seen any worms from that particular can, in the 15 years I’ve used it.
18485 didn’t really clarify much for me. For exaple, I don’t follow any of the arugments based on -0.0 being different than 0.0, unless you are talking about representing numbers less than 0 and greater than -eps(0)/2. But then I don’t see why we would fret more about errors crossing 0 than those occuring between every other pair of adjacent floating point representations.
I see a lot of explanations of why isequal(-0.0, 0.0) is false, but not why isequal(-0,0) is true.
The introduction of -0.0
occurred with adoption of the original IEEE Standard for Floating-Point Arithmetic in 1985. There has never been a Standard for Integer Arithmetic that mentions -0
as a value (other than the fact that the negative sign is absorbed, so -0 === 0
).
Just leaving this here:
julia> begin
@show 0.0 == -0.0
@show 1.0/0.0 == 1.0/-0.0
end;
0.0 == -0.0 = true
1.0 / 0.0 == 1.0 / -0.0 = false
Ultimately, I’m not sure I like completely language-defined floating point semantics. That is, I’d like something that has hardware support more than something that makes sense; julia and hence all julia software sits downstream of IEEE / Intel / AMD / ARM / $device_vendor.
For example, sqrt
: We have a perfectly fine CPU instruction for that, which happens to normally return NaN
on negative numbers; why on earth do we check for positivity? Until we can support the positivity check via FPU traps, the default sqrt
should imho not raise an exception (maybe offer “well defined” operators somewhere, but give me architecture-dependent “kinda correct but definitely fast” defaults that compile to the obvious instructions). Or the minimum of two floating point numbers; why on earth
julia> fmin(x,y)= ifelse(x<y, x, y);
julia> @code_native fmin(1.0,2.0)
.text
; Function fmin {
; Location: REPL[43]:1
vminsd %xmm1, %xmm0, %xmm0
retq
nopw %cs:(%rax,%rax)
;}
julia> @code_native min(1.0,2.0)
.text
; Function min {
vmovq %xmm1, %rcx
vmovq %xmm0, %rax
vcmpordsd %xmm0, %xmm0, %xmm2
vblendvpd %xmm2, %xmm1, %xmm0, %xmm2
vcmpordsd %xmm1, %xmm1, %xmm3
vblendvpd %xmm3, %xmm0, %xmm1, %xmm3
vmovapd %xmm2, %xmm4
testq %rcx, %rcx
jns L58
testq %rax, %rax
js L67
L46:
vcmpltsd %xmm0, %xmm1, %xmm0
vblendvpd %xmm0, %xmm2, %xmm4, %xmm0
retq
L58:
vmovapd %xmm3, %xmm4
testq %rax, %rax
jns L46
L67:
vmovapd %xmm3, %xmm4
vcmpltsd %xmm0, %xmm1, %xmm0
vblendvpd %xmm0, %xmm2, %xmm4, %xmm0
retq
nopw %cs:(%rax,%rax)
;}
edit: The “why on earth” is not so much an honest question; I’m sure there are reasons of mathematical consistency, but at some point the silicon we’re running on is ground truth, not a math textbook.
edit2: Ok, FPU traps for sqrt are a stupid idea. Still, 99% of the code is perfectly fine with NaN
square roots, and the rest should use some Base.safe_sqrt
, and the same for floating point comparisons.
You are correct in pointing out the inconsistency between IEEE handling of exceptions / out of domain errors and the rest of the language (where it is customary to throw a DomainError
or ArgumentError
). However, I am not sure that making this more consistent by moving towards the IEEE mechanisms is a good idea, they are less flexible and you will get a hybrid anyway, just the boundary will be somewhere else.
Practically,
-
for some people in some situations the options offered by IEEE are just great, because they mesh well with their algorithm or can be modified at little cost. Few people read the whole IEEE 754 though, or use more than a subset of the options.
-
for others, the whole thing is a nuisance, and
NaN
s are particularly painful since they can creep through your algorithm silently, in which case you have to debug the whole thing to find out where this started. This can be tedious for algorithms with nontrivial complexity.
I had read "Possible bug in unique/Set ", I intentionally referenced the phrase in that post about how the decision to use IEEE standard was not irrevocable, but opening it was a “can of worms”.
I wanted to cast light on casual users. Having different behaviors for integers and floats regarding +/- 0 is a huge trash can of worms. The same applies to differences between == and isequal() for finite numbers. These inconsistencies will certainly lead to many gotcha moments for new users who just expect things to work as expected.
I disagreed with the first statement that this issue is not about rounding. The issue of sorting seems to be the key issue, since the requirement that -0.0 sort before 0.0 is causing problems (I think we agree that -0.0 == 0.0). It seems that the goal of the standard was to accomplish higher precision around zero. This might be important if users are trying to “faithfully” sort numbers that are effectively underflowing, but I would argue this is not a huge issue with Julia since any desired level of precision is already easily available. Despite that, we seem to be deciding to support a standard that attempts to solve a problem Julia doesn’t have.This is why I think this is a small can of worms and definitely worth revisiting before this important milestone you are on.
My suggestion is to make signed floats behave as Ints currently do, and to abandon this part of the standard. If there is a cosmetic problem with -0 sorting randomly with 0s, then replace -0 with 0 since they are equivalent at this level of precision.
Practically nobody wants to catch and handle domain errors. They are nice during testing / development, you’re right. But once I am reasonably sure that they will never appear except for bugs, a NaN
spoiling my computation or an unhandled domain error stopping my computation differ only in the quality of logs for debugging (and this only matters if I can’t reproduce and need to look at logs / stack traces / core dumps). But the speed differences are very relevant (no SIMD).
I mean, are you using any code that relies on domain errors being thrown or min(x,y)
not using the obvious CPU instruction? Most people who expect negative numbers for their square roots check this before, instead of in a try
block.
Hence my wish for “implementation defined” default semantics of floating point: They should do whatever your CPU does, and if your program needs to handle edge cases correctly, then you have written explicit branches for all of them anyway; no real mental overhead of using Base.safe_sqrt
or Base.safe_min
if you are are already thinking about NaN
or negative numbers.
(julia is focused on fast numerics; for security relevant code / systems programming, this could be very bad, but are these problems really Float
-heavy?)
edit: I am not suggesting to move towards IEEE. I am suggesting to move towards “whatever your silicon does” (within bounds of reason), which happens to be IEEE most of the time.