A vector with custom indices

moataz-sabry · August 5, 2022, 10:33pm

is it possible to have a vector with custom predefined non-uniform indexing. Like the following made-up example, I’ve a vector with the values 1, 2, 3 of the indices 29, 360, 400.

x = [1, 2, 3] 
for (i, v) in enumerate(x)
       @show i
       end
i = 29
i = 360
i = 400

I could use a dictionary with the custom indices as keys, but iterating over a dictionary is on my machine somehow much slower than a vector, especially for big dimensions.

nsajko · August 5, 2022, 10:47pm

You’re looking for a data structure, but to get help you need to tell more about your requirements. For example, do you care about the cost of updating the structure (insertions, deletions, …)? Does it need to be mutable or immutable? I assume you want constant-time access to any element to be possible. Or do you just need iteration?
Give more illustrative examples.

moataz-sabry · August 5, 2022, 11:51pm

for example: I want to calculate vector z that depends on other parameters, say vector p and q. That problem could look like this:

p = [p₁, p₂, p₃]
q = [q₁, q₂, q₃]

z = [f(p₁), g(q₁), g(q₂), f(p₂), g(q₃), f(p₃)]

I wanted to give p the indices [1, 4, 6] and q the indices [2, 3, 5], so that I could do such a for loop:

for (i, v) in enumerate(q)
    z[i] = g(v)
end

that’s why I was asking, if I could give p and q custom indices that would make my loops easier.

p and q are defined once and are not to be changed (no deletions or insertions, …). Also immutable, I just want to access them while looping to get a specific value and index.

nsajko · August 6, 2022, 12:42am

Why not just do something like this:

p = zip([1, 4, 6], [0.3, 0.75, 0.5])
q = zip([2, 3, 5], [0.2, 0.3, 0.1])
f = sin
g = cos
z = zeros(Float64, 6)

then your loop could look like this:

for (i, v) in q
  z[i] = g(v)
end

Or you could do this:

f = sin
g = cos
pi = [1, 4, 6]
pv = map(f, [0.3, 0.75, 0.5])
qi = [2, 3, 5]
qv = map(g, [0.2, 0.3, 0.1])
z = zeros(Float64, 6)

And then loop like this:

for (i, j) in enumerate(qi)
  z[j] = qv[i]
end

mkitti · August 6, 2022, 2:26am

A Dict may also apply:

julia> d = Dict((29, 360, 400) .=> [1,2,3])
Dict{Int64, Int64} with 3 entries:
  29  => 1
  400 => 3
  360 => 2

julia> for (i, v) in pairs(d)
           @show i
       end
i = 29
i = 400
i = 360

julia> d[29]
1

julia> d[360]
2

julia> d[400]
3

rafael.guerra · August 6, 2022, 6:53am

For the simple example provided you don’t need to use loops, broadcasting f and g would be easier:

p, q = rand(3), rand(3)
ip, iq = [1,4,6], [2,3,5]
f, g = sin, cos
z = Vector{Float64}(undef,6)
z[ip] .= f.(p)
z[iq] .= g.(q)

moataz-sabry · August 6, 2022, 9:57am

thanks for your answer. that’s actually what is currently implemented to solve the problem, to have an extra vector pi with the needed indices.

But if I would need like 10 parameter vectors to compute z, I would also need to have 10 extra vectors to store the indices. That would make the namespace a bit crowded, especially that I’m eventually going to store these parameter vectors in a struct like this:

struct Parameters
p
pi
q
qi
...
end

that’s why I came to the question, if it it possible to combine p and pi into one vector with custom indices?

moataz-sabry · August 6, 2022, 10:10am

Iterating over large dimensions using dictionaries is for some reason slower than iterating using a vector. The example above is simple, the real problem could have a vector z with a length in the thousands and be computed millions of times. Therefore, performance is critical.

here is what I’ve used to check the speed of iterating over a dict and vector:

julia> function voo(V)
           for (i, v) in enumerate(V)
               y = v
               z = i
           end
           return nothing
       end
voo (generic function with 1 method)

julia> function doo(D)
           for (i, v) in D
               y = v
               z = i
           end
           return nothing
       end
doo (generic function with 1 method)

julia> D = Dict(i => 2i for i in 1:1e6);

julia> V = [2i for i in 1:1e6];

julia> @btime doo(D)
  10.117 ms (0 allocations: 0 bytes)

julia> @btime voo(V)
  24.180 ns (0 allocations: 0 bytes)

Please correct me, if I’ve done something wrong while testing.

DNF · August 6, 2022, 10:55am

Your computer is not able to iterate over a million elements and perform work on them in 24ns, that’s impossible.

What’s happening instead is that since your function doesn’t do any work, the programs just skips the whole thing and does nothing. You can probably increase your loop bounds to 1e12 with no change in runtime.

For some reason, the compiler is not able to determine that it doesn’t have to do anything when dealing with a Dict, so then it actually performs iteration.

You must somehow ‘store’ the work and force the program to return the result (perhaps you can accumulate a sum of the elements.)

One more thing: in case you are not aware, 1:1e6 is a range of floats, since 1e6 is a float. Maybe you intended this, but if not, you can write 1:10^6 or 1:1_000_000 instead.

moataz-sabry · August 6, 2022, 12:18pm

you were right regarding the 24ns, still after adjusting both for loops to do some work, iterating over a vector seems to be faster

julia> function voo(V)
           sum = 0
           for (i, v) in enumerate(V)
               sum += v
           end
           return sum
       end
voo (generic function with 1 method)

julia> function doo(D)
           sum = 0
           for (i, v) in D
               sum += v
           end
           return sum
       end
doo (generic function with 1 method)

julia> D = Dict(i => 2i for i in 1:1_000_000);

julia> V = [2i for i in 1:1_000_000];

julia> @btime doo(D)
  9.969 ms (1 allocation: 16 bytes)
1000001000000

julia> @btime voo(V)
  255.708 μs (1 allocation: 16 bytes)
1000001000000

hendri54 · August 6, 2022, 12:37pm

A NamedArray would be an option. However, with integer indices access has to use the v[Name(1)] syntax.

mkitti · August 6, 2022, 2:09pm

What are your other requirements?

Will you always iterate over the elements in a certain order?
Does the order of iteration matter?
Are the values always unique?
Will you index either array directly? Do you require random access?

moataz-sabry · August 6, 2022, 2:37pm

I kept searching around and I came up with the following idea:

struct ParametersVector <: AbstractVector{Symbol}
    indices::Vector{Int64}
    values::Vector{Symbol}
end

Base.size(p::ParametersVector) = size(p.values)
Base.getindex(p::ParametersVector, i::Int) = p.values[i]

function Base.iterate(e::Base.Iterators.Enumerate{ParametersVector}, state=(1,))
    i, rest = state[1], Base.tail(state)
    n = iterate(e.itr, rest...)
    n === nothing && return n
    (e.itr.indices[i], n[1]), (i + 1, n[2])
end

indices = [1, 4, 6]
values = [:p₁, :p₂, :p₃]

p = ParametersVector(indices, values)
3-element ParametersVector:
 :p₁
 :p₂
 :p₃

for (i, v) in enumerate(p)
    @show i, v
end

(i, v) = (1, :p₁)
(i, v) = (4, :p₂)
(i, v) = (6, :p₃)

mkitti · August 6, 2022, 5:56pm

Note that ParametersVector is technically not compliant with the AbstractArray interface since the interface requires that indices be consecutive. I recommend not extending AbstractVector but continue to define the appropriate Base methods as needed.

moataz-sabry · August 7, 2022, 12:55pm

I’m not sure I understood you there. As I’m not that expert regarding abstract types. could you explain to me what problems could arise from such an implementation?

mkitti · August 7, 2022, 11:18pm

The AbstactArray interface definition as documented here states that axes(A) where A is an AbstractArray must return " a tuple of AbstractUnitRange{<:Integer} of valid indices". That is each axis of your array must expressible as a unit range such as 2:5. Others writing methods that accept AbstractArray as a type will assume that there are consecutive indices and may then fail when passed your type.

Actually, looking at the last definition you gave, I see that the actual indices are actually consecutive so maybe there is not a problem.

julia> pv = ParametersVector([2,4,6],[:a,:b,:c])
3-element ParametersVector:
 :a
 :b
 :c

julia> pv[1]
:a

julia> pv[2]
:b

julia> pv[3]
:c

If that behavior is sufficient, then I’m not sure why the pairs behavior does not work:

julia> mypv = [1,4,6] .=> [:a,:b,:c]
3-element Vector{Pair{Int64, Symbol}}:
 1 => :a
 4 => :b
 6 => :c

julia> for (i, v) in mypv
           @show i, v
       end
(i, v) = (1, :a)
(i, v) = (4, :b)
(i, v) = (6, :c)

julia> mypv[1]
1 => :a

julia> mypv[2]
4 => :b

julia> mypv[3]
6 => :c