Multi column indexing

I’m still new to Julia but there’s still some basics that I’m still clearly not understanding. Take the following example which has confused me all morning.

storage_indx_mult = hcat([12, 165, 55, 66, 89, 101, 2], [68, 135, 409, 222,6, 818,7])
column_index_mult = [1,1,2,2,1,1,2]
id_index = [6,4,5,2,1,7,3]
test_ids_mult = storage_indx_mult[id_index, column_index_mult]

In this case I’m expecting to get back a single vector:

test_ids_mult_expected = [101, 66, 6, 135, 12, 2, 409]

But Julia gives back a full matrix:

test_ids_mult = storage_indx_mult[id_index, column_index_mult]
7×7 Matrix{Int64}:
101 101 818 818 101 101 818
66 66 222 222 66 66 222
89 89 6 6 89 89 6
165 165 135 135 165 165 135
12 12 66 66 12 12 66
2 2 7 7 2 2 7
55 55 409 409 55 55 409

As I said, I’m clearly not understanding something about Julia and Indexing. I think coming from a python background is confusing my thinking. How do I extract the single vector I want from the two index vectors?

You can try this:

getindex.(Ref(storage_indx_mult), id_index, column_index_mult)
2 Likes

Thanks. I would never have figured this out. Seems rather unintuitive compared to what I’m used to in python.

This alternative may seem more intuitive (and closer to python):

[storage_indx_mult[i, j] for (i, j) in zip(id_index, column_index_mult)]

Note that, in Julia, your storage_indx_mult[id_index, column_index_mult] is actually equivalent to [storage_indx_mult[i, j] for i in id_index, j in column_index_mult].

Yet another option:

storage_indx_mult[CartesianIndex.(id_index, column_index_mult)]
2 Likes

Thanks for these other options. The cartesian index seems the most straightforward as list comprehensions are always easy to mess up I find!

I did a quick check with BencmarkTools to check perfromance and for a typical dataset I would be working on with about 100K rows, getindex seems to be the quickest.

@btime pIDs[CartesianIndex.(ids[:,1], cc2[:, 1].+1)]
216.600 μs (14 allocations: 1.78 MiB)
@btime [pIDs[i, j] for (i, j) in zip(ids[:,1], cc2[:, 1].+1)]
1.886 ms (99894 allocations: 2.91 MiB)
@btime getindex.(Ref(pIDs), ids[:,1], cc2[:, 1].+1)
107.900 μs (13 allocations: 1010.94 KiB)