Type restriction on UnitRange

question

#1

Why are the fields of UnitRange restricted to be <: Real? Using that type could make sense for ranges of other types which are represented by integers under the hood (eg Date uses StepRange with Day(1), even though it could use UnitRange). This would be useful since UnitRange has a lot of operations defined generically that StepRange does not (eg intersect, etc).


#2

My suspicion is that it’s because UnitRange implies/assumes that the step is 1, which only makes sense if the type is real.


#3

We do currently define step(r::AbstractUnitRange) = 1, but we could just as easily define step(r::AbstractUnitRange{T}) = oneunit(T) … seems like an improvement to me.

In olden days, we only had the multiplicative identity one(T), which would have made less sense for step, but now that some dashing programmer has added oneunit, maybe it is time to reconsider.


#4

If it isn’t going to use oneunit it should be renamed to OneRange.:grinning:


#5

Thanks. I found that for my particular use case,


does everything out of the box, so it can substitute for UnitRange.


#6

step(r::AbstractUnitRange{T}) where T = T(1) seems safe. I can no longer remember why I picked 1 rather than T(1).

In contrast, step(r::AbstractUnitRange{T}) = oneunit(T) seems to open the door to 1m:10m. As I’ve argued elsewhere, I don’t think that’s a well-defined concept: we all believe that 10m == 1000cm, so why should length(1m:10m) == 10 while length(100cm:1000cm) == 991? (If you want to make a unitful range, physics seems to demand that the user supplies the step as an argument.) The definition step(r::AbstractUnitRange) = 1 is consistent with guarding against such problems.

In contrast, 1m..10m (as defined in the IntervalSets package) is perfectly well-defined, since there is no implied step.


#7

Because it is a “unit range” and you specified the units?


#8

Because it is a “unit range” and you specified the units?

That’s just a naming thing. Here we’re using unit in the sense of 1, and I know you’d agree that 1m != 1. We could change the name if you think that would help.


#9

This argument seems a bit circular. Yes, I know that’s the current behavior. But I like the name UnitRange and wish we adhered to it more literally.

I honestly don’t see any problem with saying that a:b is a “unit” range in steps of oneunit for the endpoints (promoted to a common type via promote).

I think that would be useful, strictly generalizes the current behavior, and conforms to most people’s expectations of units, e.g. what they would expect for Day(1):Day(10) or 1m:100m if you asked them.


#10

I agree that it seems natural when you are literally typing the characters in the console. But there are many cases where initial impressions lead you down the wrong path. Consider the impact of promotion alone:

julia> UInt8(0x01):Int16(5)    # what would 1mm:2ft do?
1:5

julia> a, b = 1, 5
(1, 5)

julia> shift = 0.1
0.1

julia> (a:b) .+ shift == collect(a+shift:b+shift)
true

# Now try this with a, b = 1m, 5m and shift = 1mm.
# (a:b) .+ shift would have length 5
# a+shift:b+shift might have length 4001

I was trying to think of cases in julia where a == x and b == y and yet op(a, b) != op(x, y). (E.g., a and b in meters and x and y in millimeters, and op is UnitRange.) I don’t doubt that we can do that (thanks to dispatch), but I think we try to avoid it in general. For example, I think you’d be pretty unhappy if 2*x != 2.0*x. But that’s precisely the kind of behavior you’re asking for here. In places where we elevate the type above the concept (e.g., 1:5 != collect(1:5)), people tend to get pretty unhappy.

Asking the user to specify the step is not exactly a lot of work. Really, it’s the only option given that physical units define an equivalence class (that’s their core mathematical property): under such circumstances, there is no such thing as 1, and it would be dangerous to pretend otherwise.

All this seems pretty far from the OP. I agree it doesn’t have to be Real, but the example of Dates is precisely the kind of behavior we don’t want to enable.


#11

I think that the following would be sensible behavior:

  1. a:b is parsed as colon(a, b), which dispatches to UnitRange in general (not StepRange like it does now),
  2. UnitRange(::T, ::T) checks some trait (eg has_unit_stepsize(T)), if that is false (the default), it throws an error. It should be true for <: Integer, Date, and similar user-defined types which have a “natural” stepsize.
  3. UnitRange(::S, ::T) promotes the arguments to a common type, calling the previous method.

So promote(1mm, 2ft) would either be undefined (I am unsure which package the example is for), or the UnitRange constructor would fail because it does not have a unit stepsize. 1m:3m would give something equivalent to [1m,2m,3m].


#12

I’m not saying we couldn’t define those methods, I’m saying we shouldn’t. The word “natural” is really scary: if I use a start of DateTime(2017, 10, 1) and a stop of DateTime(2017, 12, 1), what’s natural? A step of a day or a month? (Both those dates are the first days of their respective months.) Remember that a number with physical units corresponds to some external reality independent of how you choose to describe that reality. If I tell you that I marked out a playing field by drawing lines of spacing 1 between my mailbox and my ditch, even if you know my yard you have no clue how many lines I drew. In contrast, if I say “1 fathom” then you know, and if you’d personally rather calculate in meters you can convert everything and come to the same answer I would while working in fathoms. You get the same answer independent of representation: that’s the entire point of physical units.

UnitRange(::S, ::T) promotes the arguments to a common type, calling the previous method.

That was the point of my example: if promote(1mm, 2ft) promoted to ft, you’d get a range of length 2; if it promoted to mm, you’d get a range of length 609. Which one is “natural”? Your only defense is not to define promotion, but then that would mean that you can’t insert 10mm into a Vector{Meter}, which doesn’t make sense either.

If you allow 1 to be equivalent to 1m then you come to some pretty strange conclusions, like 1s == 1day (because of convert(Dates.Second, convert(Int, convert(Dates.Day(1))), see https://github.com/JuliaLang/julia/issues/19896). I don’t think anyone thinks that makes sense. But this isn’t an artificial example: you could hit it easily simply by trying to store values in arrays (since setindex! calls convert).

Until your post I didn’t fully realize that we currently support a:b for Dates. Yikes. By comparison, in a very well thought-out package for physical Units:

julia> using Unitful: s

julia> 1s:10s
ERROR: DimensionError: s and 1 are not dimensionally compatible.
Stacktrace:
 [1] colon(::Quantity{Int64, Dimensions:{𝐓}, Units:{s}}, ::Quantity{Int64, Dimensions:{𝐓}, Units:{s}}) at ./range.jl:9

julia> 1s:1s:10s
1 s:1 s:10 s

The extra effort to specify the range concretely is tiny in comparison to breaking the distributive property for ranges (see my shift example above), and tiny even in comparison to checking the documentation to see what someone has arbitrarily decided that “natural” means.


#13

Let me rephrase it then: if

  1. a concrete type T
  2. can take only discrete values
  3. which can be mapped to an contiguous subset of integers with some affine transformation f (eg identity for integers, rata die for Dates) ,

then let UnitRange(x::T,y::T) denote the set of all possible values between x and y, inclusive. This is what I meant by “natural”.

Let’s go through the examples:

  1. DateTime(2017, 10, 1):DateTime(2017, 12, 1) would represent all nanoseconds between these two dates.
  2. promote(1mm, 1ft) is (1//1000 m, 381//1250 m), does not map to <: Integer, UnitRange should throw an error. The user should use StepRange.
  3. 1s is Quantity{Int64, Dimensions:{𝐓}, Units:{s}}, which has Int64 as the underlying representation. 10s similarly. So 1s:10s is OK, equivalent to [1s,2s,...,10s].

#14

So then length(DateTime(2017, 10, 1):DateTime(2017, 12, 1)) == 5270400001, right?

Note again that DateTime(2017, 10, 1)..DateTime(2017, 12, 1) (representing an interval from IntervalSets) is a much better way of saying “all times between those two dates”, because an interval doesn’t imply a step. By not implying something you’re not controlling, you sidestep all the concerns I have raised here and I have no objections of any kind. That’s why intervals are such a fundamental type.

promote(1mm, 1ft) is (1//1000 m, 381//1250 m), does not map to <: Integer, UnitRange should throw an error. The user should use StepRange.

How do you feel about this:

julia> 1//3 : 4//3
1//3:4//3

1s is Quantity{Int64, Dimensions:{𝐓}, Units:{s}}, which has Int64 as the underlying representation. 10s similarly. So 1s:10s is OK, equivalent to [1s,2s,...,10s]

We have

julia> 1.1:10
1.1:1.0:9.1

and it returns a “StepRange” (really a StepRangeLen), not a UnitRange. Making distinctions based on floating-point vs integer but being “sloppy” about dimensionless and unitful seems backwards to me. If I’m grading a physics test, I’ll give full credit to both 3.2m and 32//10 m. I won’t give full credit to 3.2.


#15

Yes. Is there a problem with this?

Since Rational{Int64} cannot be mapped to integers using an affine mapping, this should signal an error. I realize that my proposal would break existing code. This should be fixed by making the stepsize explicit.

It is possible that you misunderstand me or I was not clear. I don’t think I advocated being sloppy about units. My proposal above only concerns units when arguments with different types are promoted to a common type which maps to integers, that’s where the “sloppiness” could come in.

I do agree that IntervalSets is a great way to work around some of the problems. I am perfectly fine with submitting PR’s to that package to obtain some behavior (I may also need iterators etc). But I also think that my proposal above is consistent (and of course breaking).

If I understand correctly (please correct me if I am wrong), you would prefer a:b to mean

a:d:b, whenever d as 1 makes sense; when in doubt, a:b should not be defined.

OTOH I want a:b to mean

all possible values between a and b, inclusive, whenever that makes sense (as I described above); when not, a:b should not be defined.

I think both can be made to work, but not at the same time. The intersection is pretty much a:b defined for integers.


#16

all possible values between a and b, inclusive, whenever that makes sense (as I described above); when not, a:b should not be defined.

That’s a great concept (it’s an Interval), but Julia has long used colon to create an AbstractRange which means a container of discrete values. It’s not so useful to have length(1.0f0:1.0001f0) == 840 simply because there are 840 Float32s between those two numbers. Nor is length(1.0f0..1.0001f0) a particularly useful concept (it depends entirely on the choice of how many mantissa/exponent bits there are, and in general I don’t think we want to allow such things to lead to dramatic differences).


#17

I am afraid you are ignoring an important part of my proposal: the requirement of an affine mapping to integers (which defines the “natural” stepsize). The above would throw an error under my proposal.

Almost. I want a finite collection of all (equally spaced) values. In a sense, the combination of UnitRange and ClosedInterval. ClosedInterval can be made to work with this, by defining methods for some types. But not, of course, for <: AbstractFloat and similar. If I make PRs for IntervalSets (as I recently did), it would lead to a situation where some methods (eg iteration) work for some subtypes for ClosedInterval, but not for others. Would you be OK with this?

In any case, Discourse is now warning me that I am talking to you too much :slight_smile: Thank you for taking the time to discuss, I will keep working on the actual data analysis problem that motivated this whole topic for me (insurance spells, delimited by dates, I need to check spells for overlap, intersections, etc) and will see what API I would need to make that easier.


#18

Goodness, I didn’t know it did that. In replying to this I’m getting the same warning. For the record, I don’t think 3 messages proposing interesting design ideas about an important topic is too much :slightly_smiling_face:.

Anyway, thanks for reminding me about your intended limitation on :. We can discuss iteration over elements of ClosedInterval in IntervalSets.