Type of array index?

I wonder what the type of indices is for standard arrays? Int, Int64, or . . . ?

I have this use in mind at present:

# Collect indices for which v[i] < thresh
function partition_vec(v, thresh)
  idx = Vector{Int??????}()
  for i in eachindex(v)
    if v[i] < thresh
      push!(idx, i)
    end
  end
  idx
end
idx = partition_vec(rand(Float64,30), 0.3)

It’s not that I’m likely going to use huge arrays for which the range of index matters, but we’ve seen the problem of int as size in C Language. You are supposed to use size_t but a lot of programmers used int instead and a lot of code broke when programs started to allocate arrays over 2 GB.

A related question is, how does one get the index type of an array? The above code would be cleaner if it were

function partition_vec(v, thresh)
  idx = Vector{index_type_of(v)}()
  . . . .

This is overkill for most cases. I’m asking this just out of curiosity.

Int64

julia> v = rand(30)
julia> thresh = 0.3
julia> idx = filter(i->v[i] < thresh, 1:length(v))
11-element Vector{Int64}:
  2
  4
  7
  8
 12
 17
 18
 24
 27
 28
 29

For a “standard” linearly indexed array it is Int, which on a 64-bit Julia equals Int64 and on a 32-bit Julia equals Int32. Generally you can use any subtype of Integer for doing the indexing.

typeof(first(eachindex(v))) should work for your purpose.

Notice that this may be something other than Int, e.g. for
v = view(rand(10, 10), 1:2, 1:2) or v = BigInt(1):BigInt(2).

6 Likes

There is a keytype function that you can use:

julia> v = [1, 2, 3];

julia> keytype(v)
Int64

julia> v = view(rand(10, 10), 1:2, 1:2);

julia> keytype(v)
CartesianIndex{2}
7 Likes

It doesn’t necessarily match the type given by eachindex though?

julia> I = BigInt(1):BigInt(2)
1:2

julia> keytype(I)
Int64

julia> typeof.(eachindex(I))
2-element Vector{DataType}:
 BigInt
 BigInt

Bug or feature?

1 Like

This code seems to break your typeof.(eachindex(I)) solution:

I = BigInt(1):BigInt(10)^100
typeof.(eachindex(I))   # ERROR: InexactError: Int64(1000000000000000000000000000000000000000 ...
typeof.(size(I))        # (BigInt,)

keytype disagrees with eachindex in more common cases as well:

julia> A = [1 2; 3 4]
julia> keytype(A)
CartesianIndex{2}
julia> eachindex(A) .|> typeof
4-element Vector{DataType}:
 Int64
 Int64
 Int64
 Int64

If the partition_vec example is close to your actual code, then it’s easier to rewrite using findall or filter:

partition_vec(v, thresh) = findall(<(thresh), v)
# or
partition_vec(v, thresh) = filter(i -> v[i] < thresh, eachindex(v))

The resulting vector type is properly inferred here.

1 Like

That was an illustration, not a solution. typeof(first(eachindex(I))) was the proposed solution and does work for this range. That broadcasting cannot create arbitrarily large vectors is a different matter (and you probably don’t have enough memory for a 10^100 element vector).

1 Like

This option does not seem to infer the index type, but your second one does:

v = 1.5:BigInt(1):10.5
typeof(first(eachindex(v)))             # BigInt  (and eltype(v) is BigFloat)
findall(<(4), v)                        # 3-element Vector{Int64}
filter(i -> v[i] < 4, eachindex(v))     # 3-element Vector{BigInt}

Why should it be exactly the same type as first(eachindex(v))? I meant that the resulting type is concrete and suitable for indexing the original collection.

Again, the difference is present in more common cases of matrices as well: findall returns CartestianIndexes, while eachindex returns Ints.

I think you are a bit too hung up on the type of indices. You can index into an array using whatever Integer type you want:

1.7.1> a = rand(5);

1.7.1> a[UInt8(3)]
0.45629312424449464

1.7.1> a[big(3)]
0.45629312424449464

1.7.1> a[3]
0.45629312424449464

So the index type is not something that you should concern yourself with in almost any case. Just use Int, which corresponds to the default machine integer type on your system (64 or 32 bit).

They keytype function exists, but as its docstring says, it

is provided mainly for compatibility with the dictionary interface.

Don’t think about it. And anyway, you should use findall for this.

Edit: But, okay, let’s say you really need this, for some reason. Then you can still use whatever integer type you want:

idx = UInt[]
for i in eachindex(v)
    push!(idx, i)

will work, and so will

idx = Int8[]
for i in eachindex(v)
    push!(idx, i)

since i is converted into eltype(idx) when you push!. You just have to make sure that the integer type is big enough to hold the values you need.

1 Like

Off-topic: Several people have suggested filter-type solutions. If that were actually what I wanted to accomplish, I would use such a solution. (I’m familiar with functional programming.) But, my real code needs to get both sides and is, currently,

function partition_vals(vals, thresh)
  idx_l = Vector{keytype(vals)}() # I've learned this trick from this thread
  idx_h = Vector{keytype(vals)}()
  for i in eachindex(vals)
    (vals[i] < thresh) ? push!(idx_l, i) : push!(idx_h, i)
  end
  (idx_l, idx_h)
end
# . . .
idx_l, idx_h = partition_val(vals, thresh)

That’s why I named my function “partition”. This is an example from Ruby:

[3, 10, 5, 9].partition {|v| v < 6} # => [[3, 5], [10, 9]]

(Not functionally equivalent, because the above doesn’t return the indices, but you get the idea.)

Okay, so, I thought about it further and have come up with this solution:

function partition_vec(v, thresh)
  idx_l = findall(<(thresh), v)
  idx_h = setdiff(collect(eachindex(v)), idx_l)
  (idx_l, idx_h)
end

I’m happy with it. Thanks!

You can do this, but as I said, keytype doesn’t really mean anything special for arrays, use whichever Integer type that suits your application, in fact, the most idiomatic would be to write

idx_l = Int[]
idx_h = Int[]

As for this:

Rule of thumb: Never use collect. 99% of the time, it just needlessly creates a temporary array.

Instead, write:

idx_h = setdiff(eachindex(v), idx_l)

However, fwiw, using deleteat!() with collect() seems to run much faster than setdiff() without it:

function partition_vec(v, a)
    idx_l = findall(<(a), v)
    idx_h = deleteat!(collect(axes(v,1)), idx_l)
    (idx_l, idx_h) 
end

Faster with collect? That must be a bug :face_with_raised_eyebrow:

1 Like

Thank you for the very useful advice!

So, I wonder where one finds a documentation that includes the interfaces?

The main Julia website offers an extremely friendly and comprehensive set of “tutorials”. That’s where I learned about setdiff. But all the examples, as far as I recall, use vectors for its arguments.

It would be nice if there is a set of documentation where one finds something like

setdiff(Vector{T}, . . . ) where T
setdiff(UnitRange{T}, . . . ) where T
. . . .

I use a julia-mode on emacs and it shows something like this

. . . .
setdiff(s::Core.Any)
setdiff(itr::Core.Any, itrs::Vararg{Core.Any})

taken from the source code of the standard library, which isn’t very helpful as a description of the user interface of setdiff, except to people who have a lot of experience in implementing Julia libraries. (In hindsight, the name “itr” must mean “iterator”, which suggests that both UnitRange and Vector are treated like an iterator by setdiff, . . .)

I don’t really think that’s going to happen. Look at the documentation of setdiff Collections and Data Structures · The Julia Language

setdiff(s, itrs...)
Construct the set of elements in s but not in any of the iterables in itrs. Maintain order with arrays.

It describes a completely generic function, which is how it should be.

1 Like