is it possible to have a vector with custom predefined non-uniform indexing. Like the following made-up example, I’ve a vector with the values 1, 2, 3 of the indices 29, 360, 400.
x = [1, 2, 3]
for (i, v) in enumerate(x)
@show i
end
i = 29
i = 360
i = 400
I could use a dictionary with the custom indices as keys, but iterating over a dictionary is on my machine somehow much slower than a vector, especially for big dimensions.
You’re looking for a data structure, but to get help you need to tell more about your requirements. For example, do you care about the cost of updating the structure (insertions, deletions, …)? Does it need to be mutable or immutable? I assume you want constant-time access to any element to be possible. Or do you just need iteration?
Give more illustrative examples.
for example: I want to calculate vector z that depends on other parameters, say vector p and q. That problem could look like this:
p = [p₁, p₂, p₃]
q = [q₁, q₂, q₃]
z = [f(p₁), g(q₁), g(q₂), f(p₂), g(q₃), f(p₃)]
I wanted to give p the indices [1, 4, 6] and q the indices [2, 3, 5], so that I could do such a for loop:
for (i, v) in enumerate(q)
z[i] = g(v)
end
that’s why I was asking, if I could give p and q custom indices that would make my loops easier.
p and q are defined once and are not to be changed (no deletions or insertions, …). Also immutable, I just want to access them while looping to get a specific value and index.
thanks for your answer. that’s actually what is currently implemented to solve the problem, to have an extra vector pi with the needed indices.
But if I would need like 10 parameter vectors to compute z, I would also need to have 10 extra vectors to store the indices. That would make the namespace a bit crowded, especially that I’m eventually going to store these parameter vectors in a struct like this:
struct Parameters
p
pi
q
qi
...
end
that’s why I came to the question, if it it possible to combine p and pi into one vector with custom indices?
Iterating over large dimensions using dictionaries is for some reason slower than iterating using a vector. The example above is simple, the real problem could have a vector z with a length in the thousands and be computed millions of times. Therefore, performance is critical.
here is what I’ve used to check the speed of iterating over a dict and vector:
julia> function voo(V)
for (i, v) in enumerate(V)
y = v
z = i
end
return nothing
end
voo (generic function with 1 method)
julia> function doo(D)
for (i, v) in D
y = v
z = i
end
return nothing
end
doo (generic function with 1 method)
julia> D = Dict(i => 2i for i in 1:1e6);
julia> V = [2i for i in 1:1e6];
julia> @btime doo(D)
10.117 ms (0 allocations: 0 bytes)
julia> @btime voo(V)
24.180 ns (0 allocations: 0 bytes)
Please correct me, if I’ve done something wrong while testing.
Your computer is not able to iterate over a million elements and perform work on them in 24ns, that’s impossible.
What’s happening instead is that since your function doesn’t do any work, the programs just skips the whole thing and does nothing. You can probably increase your loop bounds to 1e12 with no change in runtime.
For some reason, the compiler is not able to determine that it doesn’t have to do anything when dealing with a Dict, so then it actually performs iteration.
You must somehow ‘store’ the work and force the program to return the result (perhaps you can accumulate a sum of the elements.)
One more thing: in case you are not aware, 1:1e6 is a range of floats, since 1e6 is a float. Maybe you intended this, but if not, you can write 1:10^6 or 1:1_000_000 instead.
you were right regarding the 24ns, still after adjusting both for loops to do some work, iterating over a vector seems to be faster
julia> function voo(V)
sum = 0
for (i, v) in enumerate(V)
sum += v
end
return sum
end
voo (generic function with 1 method)
julia> function doo(D)
sum = 0
for (i, v) in D
sum += v
end
return sum
end
doo (generic function with 1 method)
julia> D = Dict(i => 2i for i in 1:1_000_000);
julia> V = [2i for i in 1:1_000_000];
julia> @btime doo(D)
9.969 ms (1 allocation: 16 bytes)
1000001000000
julia> @btime voo(V)
255.708 μs (1 allocation: 16 bytes)
1000001000000
Note that ParametersVector is technically not compliant with the AbstractArray interface since the interface requires that indices be consecutive. I recommend not extending AbstractVector but continue to define the appropriate Base methods as needed.
I’m not sure I understood you there. As I’m not that expert regarding abstract types. could you explain to me what problems could arise from such an implementation?
The AbstactArray interface definition as documented here states that axes(A) where A is an AbstractArray must return " a tuple of AbstractUnitRange{<:Integer} of valid indices". That is each axis of your array must expressible as a unit range such as 2:5. Others writing methods that accept AbstractArray as a type will assume that there are consecutive indices and may then fail when passed your type.
Actually, looking at the last definition you gave, I see that the actual indices are actually consecutive so maybe there is not a problem.