Why `Int` instead of `UInt` for indexing?

julia> typeof(length([]))
Int64

It seems like length(::Vector) is always nonnegative. Why is it Int64 and not UInt32 or UInt64?

1 Like

Even if you use UInt, you’d still have to check for != 0 during indexing, not simplifying your code in the slightest. Additionally, checking for <= 0 makes for an easy to distinguish overflow error check. (Though if you check for overflow, it’s really best done with the methods from Base.Checked, depending on how large the overflow is you can always wrap around back into positive numbers)

1 Like

A lot of the reason is just that signed integers have less promotion weirdness, and (especially for 64 bit) you won’t miss the extra bit of length.

9 Likes

That size_t in C++ is an unsigned type is now considered a mistake:

Because of historical accident, the C++ standard also uses unsigned integers to represent the size of containers - many members of the standards body believe this to be a mistake, but it is effectively impossible to fix at this point.

The problem is underflow when going below 0. Here’s a toy Julia example of the hazard:

# Return sum of a[1:end-1]
function sumAllButLast(a::Vector{T}) where {T}
    s = zero(T)
    for i in 1:length(a)-1
        s += a[i]
    end
    return s
end

if length(a) was UInt and a happens to be an empty vector, the interval ends up being 1:UInt(0)-1, which is 0x0000000000000001:0xffffffffffffffff.

Int can of course underflow too, but it’s for values near -2^63, and so less likely encountered in practice than values near zero.

20 Likes

It would be cool if we could have more interfaces based on natural numbers. They come up very often in practice and have lots of nice theory (lambda calculus) and easy pattern-matching.

IIRC the reason Julia uses unsafe integer arithmetic is simply that LLVM hasn’t yet provided a native-speed API for it? SaferIntegers.jl isn’t too much slower for a lot of use cases (14-25% slower arithmetic).

1 Like

This is only partially true. For most cases, the performance penalty is pretty small, but the possibility of errors removes the ability to vectorize expressions which can be a major performance issue.

3 Likes

Is this relevant for array indexing though? You will run out of memory before you run out of Int64s.

1 Like

The way I see it, length can return one of

  • Int: not a natural number; unsafe but not very unsafe; very fast
  • UInt: natural number; very unsafe; very fast
  • SafeUInt: natural number; safe; not as fast

I’m sure unsigned arithmetic performance is critical for various applications, so I guess safety isn’t an option, which means natural number APIs aren’t possible unless LLVM introduces fast, safe arithmetic. That seems like an unfortunate outcome (since I like natural numbers), so I’m hoping someone can correct me with something more optimistic.

Well, length should not really return anything else but nonnegative integers so you should be fine.

But if you insist, you can just

safe_length(v) = SafeUInt(length(v))

and use that.

1 Like

Maybe it is worth pointing out here that even though length(::Vector) returns an Int, the compiler is sophisticated enough to keep track of the fact that this particular Int is nonnegative. For example:

julia> f(v) = (length(v) < 0 && error("This can never happen!"))
f (generic function with 1 method)

julia> @code_llvm f([])
;  @ REPL[31]:1 within `f'
define i8 @julia_f_6028({}* nonnull align 16 dereferenceable(40) %0) {
top:
  ret i8 0
}
9 Likes

Can you give an example of that slowness? I’m not sure what to try. Here’s incrementing 10 million numbers, which doesn’t seem to be much slower.

julia> @btime 1 .+ $numbers;
  24.697 ms (2 allocations: 76.29 MiB)

julia> @btime 1 .+ $snumbers;
  29.372 ms (2 allocations: 76.29 MiB)

Does this count?

julia> using SaferIntegers, BenchmarkTools

julia> N = 10000;

julia> function f!(x)
           x .= 1 .+ x
       end
f! (generic function with 1 method)

julia> x1 = rand(Int, N);

julia> x2 = rand(SafeInt, N);

julia> @btime f!($x1);
  917.757 ns (0 allocations: 0 bytes)

julia> @btime f!($x2);
  4.447 μs (0 allocations: 0 bytes)
1 Like

If the speed of operations on the length of a vector or the possibility of the length exceeding typemax(Int64) is an issue, you are probably doing something wrong.

1 Like

maybe with the exception of sparse matrix ( or some index as location on a grid map)

but even then, yeah, I still agree