Finding array indices without creating an array using vectors?

taeukseo · February 27, 2021, 5:48am

I apologize for the poor phrasing of the question - it sounds like an oxymoron!

I have been struggling to write a function F which, given:

vectors of sizes N1, N2, N3… etc
a number 1<= i <= N1 * N2 * N3,

returns indices x1, x2, x3… etc. that correspond to i.

For example, assuming vectors

A = [1, 2], B = [3, 4], C = [5,6],

we can imagine a combined array ABC.

ABC =   1   3   5
        2   3   5
        1   4   5
        2   4   5
        1   3   6
        2   3   6
        1   4   6
        2   4   6

Then, given an input of i = 2, the elements on the second row in ABC are [2,3,5].
These correspond to A[2] = 2, B[1] = 3, C[1] = 5.

The function F would return the indices 2,1,1. ie.

2,1,1 = F(2,A,B,C)

I was wondering if there were functions in Julia that can do the same job as the function that I described above? I am especially hoping to find indices without actually creating the array ABC to minimize memory use.

jling · February 27, 2021, 6:33am

julia> function F(idx::Integer, arys...)
           res = zeros(Int, length(arys))
           for (i, A) in enumerate(arys)
              l = length(A)
              res[i] = mod1(idx, l)
              idx = cld(idx, l)
           end
           res
       end

I think this does what you want, not 100% sure, my brain is small and it’s kinda late. The idea is to think of each array as n-bit counter. We start with the bit that cycles the quickest, find out how many times it cycled, use that cycle number for the next bit.

taeukseo · February 27, 2021, 6:34am

I have made a naive attempt by writing down a function as follows.

function returnIndex(index, N1, N2, N3)

    index0 = 1

    tmp10 = div(index - 0.05, index0)
    tmp11 = mod(tmp10, N1)
    A1 = convert(Int, floor(tmp11))  + 1
    index0 = index0*N1
    
    tmp20 = div(index-0.05, index0)
    tmp21 = mod(tmp20, N2)
    A2 = convert(Int, floor(tmp21)) + 1
    index0 = index0*N2
    
    tmp30 = div(index-0.05, index0)
    tmp31 = mod(tmp30, N3)
    A3 = convert(Int, floor(tmp31)) + 1
    index0 = index0*N3

    return A1, A2, A3
end

To make the code more generalizable, I have tried to write the code in a recursive format.

I was wondering if the above function is too expensive for computing?

Also, is there an easy way to let the function accept variable number of arguments?

taeukseo · February 27, 2021, 6:36am

jling:

   res_T = promote_type(eltype.(arys)...)
           res = Vector{res_T}(undef, length(arys))
           for (i, A) in enumerate(arys)
              l = length(A)
              res[i] = mod1(idx, l)
              idx = cld(idx, l)
           end
           res
       end

Thanks!! I will try out the code.

jules · February 27, 2021, 6:58am

I think this does what you want:

function f(i, vecs::AbstractVector...)
    cis = CartesianIndices(length.(vecs))
    ci = cis[i]
end

We pretend we have an n-dimensional array where each vector corresponds to one dimension, with CartesianIndices describing all indices of that array in a lazy manner. Then you can index with an integer into that and will get the CartesianIndex at that position.

julia> A = [1, 2]; B = [3, 4]; C = [5,6]
2-element Vector{Int64}:
 5
 6

julia> f(2, A, B, C)
CartesianIndex(2, 1, 1)

rafael.guerra · February 27, 2021, 10:18am

Another possibility for collecting the indices sought, edited for variable number of input arrays:

function g(n, vecs...)
    it = [1:length(v) for v in vecs]
    collect(Iterators.product(it...))[n]
end

A = [1, 2]
B = [3, 4, 5]
C = [6, 7, 8, 9, 10]

julia> g(2, A, B, C)
(2, 1, 1)
julia> g(2, B, C)
(2, 1)
julia> g(2, B)
(2,)

Just noticed that this function runs out of memory for large arrays. For sake of understanding, could someone point out the issue(s)?
Thanks in advance.

sijo · February 27, 2021, 1:15pm

This creates an array that contains the Cartesian product of all the iterators. Only after that (possibly huge) array is created, does [n] extract one value from it.

You could do something like first(Iterators.drop(Iterators.product(it...), n-1)) to avoid creating the array.

tbeason · February 27, 2021, 3:46pm

Is this for complete polynomials? If so, I remember seeing a function that did this in one of the related packages providing orthogonal polynomial utilities.

DNF · February 27, 2021, 4:48pm

floor(Int, tmp11)

rafael.guerra · February 27, 2021, 6:38pm

@sijo, thanks for the explanation, it makes sense. However, this function will be very slow for very large arrays.
The direct computation by @jling seems to be more natural/a more direct solution.
@jules’ elegant CartesianIndices solution ran into problems in the first example below, though. It’s as if the choice of the integer types for the array indices was not appropriate for their Cartesian product?

function f1(n::Integer, vecs...)  # by @jling
    res = Vector{Int}(undef, length(vecs))
    @inbounds for i in eachindex(vecs)
        l = length(vecs[i])
       res[i] = mod1(n, l)
       n = cld(n, l)
    end
    res
end

function f2(n, vecs::AbstractVector...)  # by @jules
    cis = CartesianIndices(length.(vecs))
    ci = cis[n]
end

using BenchmarkTools
A = randn(1_000_000);
B = randn(2_000_000);
C = randn(5_000_000);
@btime f1(11_000_000_000,$A,$B,$C)   # 41 ns (1 allocation: 112 bytes)
@btime f2(11_000_000_000,$A,$B,$C)   # ERROR: BoundsError: attempt to access 1000000×2000000×5000000

A = randn(10_000_000);
B = randn(20_000_000);
C = randn(50_000_000);
@btime f1(11_000_000_000,$A,$B,$C)   #  43 ns (1 allocation: 112 bytes)
@btime f2(11_000_000_000,$A,$B,$C)   #  310 ns (6 allocations: 192 bytes)

jules · February 27, 2021, 6:57pm

That’s probably because:

julia> 1_000_000 * 2_000_000 * 5_000_000
-8446744073709551616

The bounds check integer overflows. I don’t know if that’s worth an issue, because “real” arrays probably never have sizes that can cause that

jling · February 27, 2021, 6:57pm

julia> function f1(n, vecs...)  # by @jling
           res = Vector{Int}(undef, length(vecs))
           @inbounds for i in eachindex(vecs) # this inbounds can already be inferred by compiler I think
              l = length(vecs[i])
              res[i] = mod1(n, l)
              n = cld(n, l)
           end
           res
       end
f1 (generic function with 1 method)

didn’t know we’re competing for last drop of performance but here’s a much faster version.

rafael.guerra · February 27, 2021, 7:08pm

@jling, thank you. This is just for Julia learning purposes.

Fixed the code above with your new turbo version and given the new running times, I think logic finally prevailed.

taeukseo · February 27, 2021, 7:46pm

Ah! This was for parallelization of a grid search problem. I was trying to find values for each element of a large functional array.

tbeason · February 27, 2021, 8:03pm

BTW, hi Taeuk

taeukseo · February 27, 2021, 10:12pm

Hi Tyler! What a small world!!

Topic		Replies	Views
Indexing of vector of arrays New to Julia	2	227	March 22, 2024
Questions of vector New to Julia question , arrays	9	324	September 22, 2022
Questions about vector New to Julia question , arrays	5	355	October 4, 2022
How to return the index of element in Array in Integer not CartesianIndex? General Usage question	2	863	September 6, 2021
Vector{Vector} indices General Usage indexing , arrays	22	2779	September 19, 2022

Finding array indices without creating an array using vectors?

Related topics