# Type of array index?

I wonder what the type of indices is for standard arrays? Int, Int64, or . . . ?

I have this use in mind at present:

``````# Collect indices for which v[i] < thresh
function partition_vec(v, thresh)
idx = Vector{Int??????}()
for i in eachindex(v)
if v[i] < thresh
push!(idx, i)
end
end
idx
end
idx = partition_vec(rand(Float64,30), 0.3)
``````

It’s not that I’m likely going to use huge arrays for which the range of index matters, but we’ve seen the problem of `int` as size in C Language. You are supposed to use `size_t` but a lot of programmers used `int` instead and a lot of code broke when programs started to allocate arrays over 2 GB.

A related question is, how does one get the index type of an array? The above code would be cleaner if it were

``````function partition_vec(v, thresh)
idx = Vector{index_type_of(v)}()
. . . .
``````

This is overkill for most cases. I’m asking this just out of curiosity.

Int64

``````julia> v = rand(30)
julia> thresh = 0.3
julia> idx = filter(i->v[i] < thresh, 1:length(v))
11-element Vector{Int64}:
2
4
7
8
12
17
18
24
27
28
29``````

For a “standard” linearly indexed array it is `Int`, which on a 64-bit Julia equals `Int64` and on a 32-bit Julia equals `Int32`. Generally you can use any subtype of `Integer` for doing the indexing.

`typeof(first(eachindex(v)))` should work for your purpose.

Notice that this may be something other than `Int`, e.g. for
`v = view(rand(10, 10), 1:2, 1:2)` or `v = BigInt(1):BigInt(2)`.

6 Likes

There is a `keytype` function that you can use:

``````julia> v = [1, 2, 3];

julia> keytype(v)
Int64

julia> v = view(rand(10, 10), 1:2, 1:2);

julia> keytype(v)
CartesianIndex{2}
``````
7 Likes

It doesn’t necessarily match the type given by `eachindex` though?

``````julia> I = BigInt(1):BigInt(2)
1:2

julia> keytype(I)
Int64

julia> typeof.(eachindex(I))
2-element Vector{DataType}:
BigInt
BigInt
``````

Bug or feature?

1 Like

This code seems to break your `typeof.(eachindex(I))` solution:

``````I = BigInt(1):BigInt(10)^100
typeof.(eachindex(I))   # ERROR: InexactError: Int64(1000000000000000000000000000000000000000 ...
typeof.(size(I))        # (BigInt,)
``````

`keytype` disagrees with `eachindex` in more common cases as well:

``````julia> A = [1 2; 3 4]
julia> keytype(A)
CartesianIndex{2}
julia> eachindex(A) .|> typeof
4-element Vector{DataType}:
Int64
Int64
Int64
Int64
``````

If the `partition_vec` example is close to your actual code, then it’s easier to rewrite using `findall` or `filter`:

``````partition_vec(v, thresh) = findall(<(thresh), v)
# or
partition_vec(v, thresh) = filter(i -> v[i] < thresh, eachindex(v))
``````

The resulting vector type is properly inferred here.

1 Like

That was an illustration, not a solution. `typeof(first(eachindex(I)))` was the proposed solution and does work for this range. That broadcasting cannot create arbitrarily large vectors is a different matter (and you probably don’t have enough memory for a 10^100 element vector).

1 Like

This option does not seem to infer the index type, but your second one does:

``````v = 1.5:BigInt(1):10.5
typeof(first(eachindex(v)))             # BigInt  (and eltype(v) is BigFloat)
findall(<(4), v)                        # 3-element Vector{Int64}
filter(i -> v[i] < 4, eachindex(v))     # 3-element Vector{BigInt}
``````

Why should it be exactly the same type as `first(eachindex(v))`? I meant that the resulting type is concrete and suitable for indexing the original collection.

Again, the difference is present in more common cases of matrices as well: `findall` returns `CartestianIndex`es, while `eachindex` returns `Int`s.

I think you are a bit too hung up on the type of indices. You can index into an array using whatever `Integer` type you want:

``````1.7.1> a = rand(5);

1.7.1> a[UInt8(3)]
0.45629312424449464

1.7.1> a[big(3)]
0.45629312424449464

1.7.1> a
0.45629312424449464
``````

So the index type is not something that you should concern yourself with in almost any case. Just use `Int`, which corresponds to the default machine integer type on your system (64 or 32 bit).

They `keytype` function exists, but as its docstring says, it

is provided mainly for compatibility with the dictionary interface.

Don’t think about it. And anyway, you should use `findall` for this.

Edit: But, okay, let’s say you really need this, for some reason. Then you can still use whatever integer type you want:

``````idx = UInt[]
for i in eachindex(v)
push!(idx, i)
``````

will work, and so will

``````idx = Int8[]
for i in eachindex(v)
push!(idx, i)
``````

since `i` is converted into `eltype(idx)` when you `push!`. You just have to make sure that the integer type is big enough to hold the values you need.

1 Like

Off-topic: Several people have suggested filter-type solutions. If that were actually what I wanted to accomplish, I would use such a solution. (I’m familiar with functional programming.) But, my real code needs to get both sides and is, currently,

``````function partition_vals(vals, thresh)
idx_l = Vector{keytype(vals)}() # I've learned this trick from this thread
idx_h = Vector{keytype(vals)}()
for i in eachindex(vals)
(vals[i] < thresh) ? push!(idx_l, i) : push!(idx_h, i)
end
(idx_l, idx_h)
end
# . . .
idx_l, idx_h = partition_val(vals, thresh)
``````

That’s why I named my function “partition”. This is an example from Ruby:

``````[3, 10, 5, 9].partition {|v| v < 6} # => [[3, 5], [10, 9]]
``````

(Not functionally equivalent, because the above doesn’t return the indices, but you get the idea.)

Okay, so, I thought about it further and have come up with this solution:

``````function partition_vec(v, thresh)
idx_l = findall(<(thresh), v)
idx_h = setdiff(collect(eachindex(v)), idx_l)
(idx_l, idx_h)
end
``````

I’m happy with it. Thanks!

You can do this, but as I said, `keytype` doesn’t really mean anything special for arrays, use whichever `Integer` type that suits your application, in fact, the most idiomatic would be to write

``````idx_l = Int[]
idx_h = Int[]
``````

As for this:

Rule of thumb: Never use `collect`. 99% of the time, it just needlessly creates a temporary array.

``````idx_h = setdiff(eachindex(v), idx_l)
``````

However, fwiw, using `deleteat!()` with `collect()` seems to run much faster than `setdiff()` without it:

``````function partition_vec(v, a)
idx_l = findall(<(a), v)
idx_h = deleteat!(collect(axes(v,1)), idx_l)
(idx_l, idx_h)
end
``````

Faster with `collect`? That must be a bug 1 Like

Thank you for the very useful advice!

So, I wonder where one finds a documentation that includes the interfaces?

The main Julia website offers an extremely friendly and comprehensive set of “tutorials”. That’s where I learned about `setdiff`. But all the examples, as far as I recall, use vectors for its arguments.

It would be nice if there is a set of documentation where one finds something like

``````setdiff(Vector{T}, . . . ) where T
setdiff(UnitRange{T}, . . . ) where T
. . . .
``````

I use a julia-mode on emacs and it shows something like this

``````. . . .
setdiff(s::Core.Any)
setdiff(itr::Core.Any, itrs::Vararg{Core.Any})
``````

taken from the source code of the standard library, which isn’t very helpful as a description of the user interface of `setdiff`, except to people who have a lot of experience in implementing Julia libraries. (In hindsight, the name “itr” must mean “iterator”, which suggests that both UnitRange and Vector are treated like an iterator by `setdiff`, . . .)

I don’t really think that’s going to happen. Look at the documentation of `setdiff` Collections and Data Structures · The Julia Language

``````setdiff(s, itrs...)
Construct the set of elements in s but not in any of the iterables in itrs. Maintain order with arrays.
``````

It describes a completely generic function, which is how it should be.

1 Like