Hi all,
I have decided to switch from Matlab/Python to Julia. Before coding seriously, I have implemented some toy examples in order to familiarize myself with the Julian workflow.
However, I am facing a strange problem when trying to substract a scalar to a StepRangeLen
. The following example will clarify the issue with Julia 1.8.0.
Let’s take the following StepRangeLen
x
julia> x = 1e-3:1e-3:1.
0.001:0.001:1.0
Now let’s condiser the following operation
julia> x[1] - 1e-3 # Obviously it is 0.0
But
julia> y = (x .- 1e-3)[1] # It should be 0.0
-2.0816681711721686e-20
# Convert x into an array
julia> z = (collect(x) .- 1e-3)[1]
0.0
I can’t figure out why both results are different.
Is it a bug or is there a clear explanation behind this behaviour of StepRangeLen
?
Thank you for your help!
3 Likes
StepRangeLen
tries to be more accurate than arrays. So, for example,
julia> x = 1e-3:1e-3:1.2e-3
0.001:0.001:0.001
julia> collect(x) .- 1e-3
12-element Vector{Float64}:
0.0
0.001
0.002
0.003
0.004
0.005
0.006
0.007
0.008
0.009000000000000001
0.009999999999999998
0.011
julia> y = x .- 1e-3
-2.0816681711721686e-20:0.001:0.011
julia> collect(y)
12-element Vector{Float64}:
-2.0816681711721686e-20
0.001
0.002
0.003
0.004
0.005
0.006
0.007
0.008
0.009
0.01
0.011
While it doesn’t get the zero exactly, it gets most other numbers more accurately than an array broadcasting would. The zero is also pretty well approximated, although it’s not exact. I’m uncertain if this is a bug, though, as this is likely arising from the fact that the number 1e-3
can’t be represented exactly in finite precision.
Without the broadcast, however,
julia> x[1] - 1e-3 == collect(x)[1] - 1e-3
true
julia> x[1] - 1e-3, collect(x)[1] - 1e-3
(0.0, 0.0)
The difference comes in with vectorization, and one possible source is the use of @simd
during Vector
broadcasting. (Edit: @simd
is not the source of the difference, so disregard the rest of this comment.)
For a .- 0.001
on a StepRangeLen
, only the initial value of the range gets 0.001
subtracted from it (since the rest of the range can be inferred based on that, the step
and the length
). So it’s a scalar subtraction.
To do the same broadcast subtraction on an actual Vector
(the result of collect
), (as far as I can tell from following the method chain) @simd
is used to perform subtraction on the whole vector efficiently. And as per the docs of @simd
:
- Floating-point operations on reduction variables can be reordered, possibly causing different results than without @simd.
I’m not sure what sort of reordering on a binary subtraction can lead to this difference though.
No. This has nothing to do with @simd
or vectorization. Broadcasting basic arithmetic over ranges is not broadcasting at all, but rather it is specialized to recompute a new range with O(1) cost. How you do this is the hard part, especially how it interacts with all the other smarts that ranges have.
In this case, constructing the step range 1e-3:1e-3:1.0
identifies that the endpoints and step are within floating point precision of the exact \frac{1}{1000}, and then it compensates to that. But that compensation means that it’s representing that first point as a higher precision representation of \frac{1}{1000}. That’s why you’re getting the answer you have there:
julia> Float64(big(1)//1000 - 0.001)
-2.0816681711721686e-20
If, however, all the values step-range do not rationalize nicely to \frac{1}{1000}, this will not happen:
julia> x = 1e-3:π/123:1.
0.001:0.025541403687721894:0.9971147438211538
julia> x .- 1e-3
0.0:0.025541403687721894:0.9961147438211538
julia> (x .- 1e-3)[1]
0.0
It can be helpful to think of ranges as higher precision objects that choose their exact value based upon all the values you give (within floating point rounding of the values you passed, that is). Just like working with mixed precision in other cases, this affects the order of operations you should do. It’s also worth noting that sometimes doing floating point math (yes, even something as innocuous as \times10^{-3}) before constructing ranges can lead to surprising results because the result of the floating point arithmetic isn’t quite hitting the value you expected, which makes it no longer be within rounding range of an even fraction of the other endpoints.
6 Likes
Thanks @jishnub, @digital_carver and @mbauman for your detailed answers.
@mbauman Your explanation makes things clearer for a julia newbie like me !
1 Like
Following up on the “order of operations” comment:
julia> y = (1e-3 - 1e-3):1e-3:(1 - 1e-3)
0.0:0.001:0.999
julia> y[1]
0.0
It’s also perhaps worth pointing out that Matlab’s ranges have some “slop” to their endpoints to try to make things do what you mean a little more frequently, trying to paper over floating point wonkiness that is bound to occur. But it too has significant challenges when you compose other operations with it, and it’s (to my knowledge) not specified how it works. It will “snap” endpoints if they’re within some range of a close multiple of a step — even beyond floating point rounding. (To my memory anyhow, it’s been a long time since I’ve had a Matlab license.)
1 Like
Thank you for the explanation. It led me to the relevant part of the range implementation, which is pretty complex and intricate.
My mental model of ranges had been that of just storing the start, step, and len components and doing operations on the fly, but now I realize Julia goes to great lengths to try and preserve precision to the extent possible.