Why `Int` instead of `UInt` for indexing?

jzr · June 26, 2021, 4:28am

julia> typeof(length([]))
Int64

It seems like length(::Vector) is always nonnegative. Why is it Int64 and not UInt32 or UInt64?

Sukera · June 26, 2021, 4:50am

Even if you use UInt, you’d still have to check for != 0 during indexing, not simplifying your code in the slightest. Additionally, checking for <= 0 makes for an easy to distinguish overflow error check. (Though if you check for overflow, it’s really best done with the methods from Base.Checked, depending on how large the overflow is you can always wrap around back into positive numbers)

Oscar_Smith · June 26, 2021, 4:51am

A lot of the reason is just that signed integers have less promotion weirdness, and (especially for 64 bit) you won’t miss the extra bit of length.

arch.d.robison · June 28, 2021, 2:41am

That size_t in C++ is an unsigned type is now considered a mistake:

Because of historical accident, the C++ standard also uses unsigned integers to represent the size of containers - many members of the standards body believe this to be a mistake, but it is effectively impossible to fix at this point.

The problem is underflow when going below 0. Here’s a toy Julia example of the hazard:

# Return sum of a[1:end-1]
function sumAllButLast(a::Vector{T}) where {T}
    s = zero(T)
    for i in 1:length(a)-1
        s += a[i]
    end
    return s
end

if length(a) was UInt and a happens to be an empty vector, the interval ends up being 1:UInt(0)-1, which is 0x0000000000000001:0xffffffffffffffff.

Int can of course underflow too, but it’s for values near -2^63, and so less likely encountered in practice than values near zero.

jzr · June 28, 2021, 2:56am

It would be cool if we could have more interfaces based on natural numbers. They come up very often in practice and have lots of nice theory (lambda calculus) and easy pattern-matching.

IIRC the reason Julia uses unsafe integer arithmetic is simply that LLVM hasn’t yet provided a native-speed API for it? SaferIntegers.jl isn’t too much slower for a lot of use cases (14-25% slower arithmetic).

Oscar_Smith · June 28, 2021, 2:57am

This is only partially true. For most cases, the performance penalty is pretty small, but the possibility of errors removes the ability to vectorize expressions which can be a major performance issue.

Tamas_Papp · June 28, 2021, 7:10am

Is this relevant for array indexing though? You will run out of memory before you run out of Int64s.

jzr · June 28, 2021, 7:41am

The way I see it, length can return one of

Int: not a natural number; unsafe but not very unsafe; very fast
UInt: natural number; very unsafe; very fast
SafeUInt: natural number; safe; not as fast

I’m sure unsigned arithmetic performance is critical for various applications, so I guess safety isn’t an option, which means natural number APIs aren’t possible unless LLVM introduces fast, safe arithmetic. That seems like an unfortunate outcome (since I like natural numbers), so I’m hoping someone can correct me with something more optimistic.

Tamas_Papp · June 28, 2021, 7:55am

Well, length should not really return anything else but nonnegative integers so you should be fine.

But if you insist, you can just

safe_length(v) = SafeUInt(length(v))

and use that.

Per · June 28, 2021, 8:35am

Maybe it is worth pointing out here that even though length(::Vector) returns an Int, the compiler is sophisticated enough to keep track of the fact that this particular Int is nonnegative. For example:

julia> f(v) = (length(v) < 0 && error("This can never happen!"))
f (generic function with 1 method)

julia> @code_llvm f([])
;  @ REPL[31]:1 within `f'
define i8 @julia_f_6028({}* nonnull align 16 dereferenceable(40) %0) {
top:
  ret i8 0
}

jzr · June 29, 2021, 10:36pm

Can you give an example of that slowness? I’m not sure what to try. Here’s incrementing 10 million numbers, which doesn’t seem to be much slower.

julia> @btime 1 .+ $numbers;
  24.697 ms (2 allocations: 76.29 MiB)

julia> @btime 1 .+ $snumbers;
  29.372 ms (2 allocations: 76.29 MiB)

GunnarFarneback · June 30, 2021, 6:12am

Does this count?

julia> using SaferIntegers, BenchmarkTools

julia> N = 10000;

julia> function f!(x)
           x .= 1 .+ x
       end
f! (generic function with 1 method)

julia> x1 = rand(Int, N);

julia> x2 = rand(SafeInt, N);

julia> @btime f!($x1);
  917.757 ns (0 allocations: 0 bytes)

julia> @btime f!($x2);
  4.447 μs (0 allocations: 0 bytes)

gustaphe · June 30, 2021, 6:59am

If the speed of operations on the length of a vector or the possibility of the length exceeding typemax(Int64) is an issue, you are probably doing something wrong.

jling · June 30, 2021, 7:05am

maybe with the exception of sparse matrix ( or some index as location on a grid map)

but even then, yeah, I still agree

Topic		Replies	Views
Why does the type of an index default to Int? Internals & Design question , proposal	4	810	September 10, 2017
Int v UInt General Usage	4	3684	June 6, 2018
A plea for int overflow checking as the default Internals & Design question , proposal , integer-overflow	80	11767	December 17, 2021
Multiplication issue New to Julia question	9	395	May 17, 2023
Potential solution to the overflow problem; 64-bit considered harmful, 32- or 21-bit better General Usage integer-overflow	6	3641	October 18, 2021

Why `Int` instead of `UInt` for indexing?

Related topics