julia> typeof(length([]))
Int64
It seems like length(::Vector)
is always nonnegative. Why is it Int64
and not UInt32
or UInt64
?
julia> typeof(length([]))
Int64
It seems like length(::Vector)
is always nonnegative. Why is it Int64
and not UInt32
or UInt64
?
Even if you use UInt
, you’d still have to check for != 0
during indexing, not simplifying your code in the slightest. Additionally, checking for <= 0
makes for an easy to distinguish overflow error check. (Though if you check for overflow, it’s really best done with the methods from Base.Checked
, depending on how large the overflow is you can always wrap around back into positive numbers)
A lot of the reason is just that signed integers have less promotion weirdness, and (especially for 64 bit) you won’t miss the extra bit of length.
That size_t in C++ is an unsigned type is now considered a mistake:
Because of historical accident, the C++ standard also uses unsigned integers to represent the size of containers - many members of the standards body believe this to be a mistake, but it is effectively impossible to fix at this point.
The problem is underflow when going below 0. Here’s a toy Julia example of the hazard:
# Return sum of a[1:end-1]
function sumAllButLast(a::Vector{T}) where {T}
s = zero(T)
for i in 1:length(a)-1
s += a[i]
end
return s
end
if length(a)
was UInt
and a
happens to be an empty vector, the interval ends up being 1:UInt(0)-1
, which is 0x0000000000000001:0xffffffffffffffff
.
Int
can of course underflow too, but it’s for values near -2^63, and so less likely encountered in practice than values near zero.
It would be cool if we could have more interfaces based on natural numbers. They come up very often in practice and have lots of nice theory (lambda calculus) and easy pattern-matching.
IIRC the reason Julia uses unsafe integer arithmetic is simply that LLVM hasn’t yet provided a native-speed API for it? SaferIntegers.jl isn’t too much slower for a lot of use cases (14-25% slower arithmetic).
This is only partially true. For most cases, the performance penalty is pretty small, but the possibility of errors removes the ability to vectorize expressions which can be a major performance issue.
Is this relevant for array indexing though? You will run out of memory before you run out of Int64
s.
The way I see it, length
can return one of
Int
: not a natural number; unsafe but not very unsafe; very fastUInt
: natural number; very unsafe; very fastSafeUInt
: natural number; safe; not as fastI’m sure unsigned arithmetic performance is critical for various applications, so I guess safety isn’t an option, which means natural number APIs aren’t possible unless LLVM introduces fast, safe arithmetic. That seems like an unfortunate outcome (since I like natural numbers), so I’m hoping someone can correct me with something more optimistic.
Well, length
should not really return anything else but nonnegative integers so you should be fine.
But if you insist, you can just
safe_length(v) = SafeUInt(length(v))
and use that.
Maybe it is worth pointing out here that even though length(::Vector)
returns an Int
, the compiler is sophisticated enough to keep track of the fact that this particular Int
is nonnegative. For example:
julia> f(v) = (length(v) < 0 && error("This can never happen!"))
f (generic function with 1 method)
julia> @code_llvm f([])
; @ REPL[31]:1 within `f'
define i8 @julia_f_6028({}* nonnull align 16 dereferenceable(40) %0) {
top:
ret i8 0
}
Can you give an example of that slowness? I’m not sure what to try. Here’s incrementing 10 million numbers, which doesn’t seem to be much slower.
julia> @btime 1 .+ $numbers;
24.697 ms (2 allocations: 76.29 MiB)
julia> @btime 1 .+ $snumbers;
29.372 ms (2 allocations: 76.29 MiB)
Does this count?
julia> using SaferIntegers, BenchmarkTools
julia> N = 10000;
julia> function f!(x)
x .= 1 .+ x
end
f! (generic function with 1 method)
julia> x1 = rand(Int, N);
julia> x2 = rand(SafeInt, N);
julia> @btime f!($x1);
917.757 ns (0 allocations: 0 bytes)
julia> @btime f!($x2);
4.447 μs (0 allocations: 0 bytes)
If the speed of operations on the length of a vector or the possibility of the length exceeding typemax(Int64)
is an issue, you are probably doing something wrong.
maybe with the exception of sparse matrix ( or some index as location on a grid map)
but even then, yeah, I still agree