Array of tuples

I have an array A of tuples Array{TupleFloat64,Float64},1}, but am not able to make two arrays out of this like:
I1 = A[:][1]
I2 = A[:][2
even though I can access the individual elements with A[i][j]

2 Likes

Please give a minimal working example in the future (in this case, a small example case of your array A. Also please quote your code with backticks.

One solution is

I1 = first.(A)
I2 = last.(A)

Another is

I1 = [x[1] for x in A]
13 Likes

Dear David,

Thank you for providing a solution. I am an experienced C and C++ programmer but a newcomer to Julia. Is there a deeper reason why A[:][1] does not work to make an array with the first element of the tupple?

Kind regards,

Jacques

A[:] is the same as A for a vector, regardless what is inside, indexing out all elements.

julia> A=[(1, 2), (2, 3), (3, 4)]
3-element Array{Tuple{Int64,Int64},1}:
 (1, 2)
 (2, 3)
 (3, 4)

julia> A[:]
3-element Array{Tuple{Int64,Int64},1}:
 (1, 2)
 (2, 3)
 (3, 4)

so A[:][1] is the same as A[1].

If you have a two-dimensional array you can extract columns with colon indexing though.

julia> A = [1 2;2 3;3 4]
3×2 Array{Int64,2}:
 1  2
 2  3
 3  4

julia> A[:, 1]
3-element Array{Int64,1}:
 1
 2
 3
1 Like

Dear Gunar,

Thanks for the additional info. What confused is that, as you say, the colon indexing can be used to extract columns in a two-dimensional array, A[:,1], so naively I thought/hoped that A[:][1] would do something similar for an array of tuples. I still have problems really grasping the interplay (or non-interplay) of tuples and arrays. The original cause of my problem is that a function returning multiple values automatically returns them in a tuple. If they would be returned in an array, nothing of this would happen.

Thank,

Jacques

A missing piece of information might be that repeated brackets isn’t a special syntax but just repeated indexing, as witnessed by quoting it.

julia> :(A[i][j])
:((A[i])[j])

If you have control of the function you can choose to package your multiple values in a vector before returning them.

julia> f(i) = [i, i + 1]
f (generic function with 1 method)

julia> A = f.([1, 2, 3])
3-element Array{Array{Int64,1},1}:
 [1, 2]
 [2, 3]
 [3, 4]

This doesn’t in itself make it easier to slice out a part than with a vector of tuples but makes it a little more convenient to repackage it in an array, that can be sliced.

julia> B = reduce(hcat, A)
2×3 Array{Int64,2}:
 1  2  3
 2  3  4

julia> B[1, :]
3-element Array{Int64,1}:
 1
 2
 3

However, you are probably better off with some kind of dot vectorization or comprehension solution like David proposed.

2 Likes

Returning an array would be much more expensive than returning a tuple, as an array has a significant amount of overhead compared to a tuple (which can be exactly zero-overhead in many circumstances), which is why it’s not done. Also, I’m not convinced that that would help in your case anyway, as a vector-of-vectors is still a completely different structure than a 2D matrix, and something like A[:][1] would not work even if each element of A were a vector.

Maybe you can describe more about what the actual problem you’re trying to solve is?

2 Likes

Dear Robin,

Thanks for taking the time to replying to this. I understand your point about an array of arrays not being a matrix. The actual problem is that I have a function f(s) computing various measurements for a specific parameter value s. I am calling this function on an array of parameter values using f.(s_array). What I want to end up with is to have several arrays of measurements, which will effectively hold m1(s_array), m2(s_array), … where m1,m2 are my measurements.

What I could is to define a struct I guess, but I wanted to avoid this complication. Or as Gunnar just wrote, I could also put the measurement in an array in the function and then return the array. In the latter case I will end up with an array of arrays, while in the former case I would end up with an array of structs… not sure if that is any better. At the end of the day I want to plot each measurement versus the parameter values. Seems like a very standard problem.

Cheers,

Jacques

I see, thanks. One easy solution is what @dpsanders suggested above and using a comprehension:

julia> function f(x)
         x, 2x
       end
f (generic function with 1 method)

julia> xs = [1,2,3]
3-element Array{Int64,1}:
 1
 2
 3

julia> ys = f.(xs)
3-element Array{Tuple{Int64,Int64},1}:
 (1, 2)
 (2, 4)
 (3, 6)

julia> [y[1] for y in ys]
3-element Array{Int64,1}:
 1
 2
 3

julia> [y[2] for y in ys]
3-element Array{Int64,1}:
 2
 4
 6

this should be pretty efficient and easy to generalize. If you don’t want to use the comprehension syntax, you can also broadcast the getindex function (that’s the function that’s called when you do foo[bar]):

julia> getindex.(ys, 1)
3-element Array{Int64,1}:
 1
 2
 3

julia> getindex.(ys, 2)
3-element Array{Int64,1}:
 2
 4
 6

If you want to get really fancy, there’s one more trick you can try, as long as all of your returned values are of the same type and that type is isbits (so, a primitive immutable type like Float64 or a struct or tuple made up of primitive immutables). If all of your parameter values are Float64 or Int, for example, then this will work. All we have to do is rely on the fact that isbits types are stored inline in arrays, and we can freely transform an array-of-tuples into a matrix with reinterpret():

julia> r = reinterpret(Int, ys, (2, 3))
2×3 Array{Int64,2}:
 1  2  3
 2  4  6

julia> r[1, :]
3-element Array{Int64,1}:
 1
 2
 3

julia> r[2, :]
3-element Array{Int64,1}:
 2
 4
 6

Edit: be warned that it’s easy to forget the column-major layout order and get the reinterpret size wrong. I had to edit this post because I got it wrong myself!

Which of these is appropriate will depend on your particular needs, and I suggest using GitHub - JuliaCI/BenchmarkTools.jl: A benchmarking framework for the Julia language to determine which performs best for your case.

7 Likes

Dear Robin,

Thanks for the detailed explanation. To me it seems that the getindex function gets really close to what I was originally looking for!

The reinterpret won’t work in this case because the return values are of different types (floats and complex).

Cheers,

Jacques

This is extremely helpful. Alternatively, is there an efficient way to do this using view? I put together the following benchmark for comprehension vs. view vs. getindex and was surprised by the results. I guess I expected view to be faster for large arrays. Am I missing something, or is there an good reason for view being slow? Why are there so many allocations?

using BenchmarkTools

N = 10
data = [rand(3) for i ∈ 1:N]

@btime [q[2] for q in $data]
@btime getindex.($data, 2)
@btime view.($data,2)

returns

57.476 ns (2 allocations: 176 bytes)
48.976 ns (1 allocation: 160 bytes)
108.226 ns (11 allocations: 640 bytes)

For N = 1000000, I get:

7.812 ms (3 allocations: 7.63 MiB)
7.815 ms (2 allocations: 7.63 MiB)
35.088 ms (1000002 allocations: 53.41 MiB)
1 Like

Don’t worry about the vector-of-tuples part of this, just look at what each scalar operation you’re doing is. getindex.(data, 2) calls getindex(d, 2) for each d in data, while view.(data, 2) calls view(d, 2) for each d.

In your case, d is a 3-element vector, so getindex(d, 2) returns the second index, which is just a Float64. That’s an extremely cheap operation. On the other hand, view(d, 2) actually constructs a View representing the second element of d. While a View may be cheaper to allocate than a new Array (that’s the whole point, after all), they’re not cheaper to allocate than just a single Float64.

On the other hand, if you did something like:

data = [rand(10000) for i in 1:N]
@btime getindex.($data, Ref(1:1000))
@btime view.($data, Ref(1:1000))

then you might see view coming out ahead, since it will avoid creating a 1000-element copy of each element of data.

2 Likes

This makes sense. Correct me if I’m wrong, but in my usage of view.() I was essentially creating an array of N views vs a view to an N length Array.

Also broadcasting works fine:

julia> A = [(1, 2), (2, 3), (3, 4)];

julia> I1 = (x->x[1]).(A)
3-element Array{Int64,1}:
 1
 2
 3

julia> I2 = (x->x[2]).(A)
3-element Array{Int64,1}:
 2
 3
 4
4 Likes

It seems the API has changed in v1.5 (or earlier versions). Now reinterpret no longer supports the third argument. We have to combine reinterpret and reshape to achieve the same purpose.

julia> r = reshape(reinterpret(Int, ys), (2, 3))
2×3 reshape(reinterpret(Int64, ::Array{Tuple{Int64,Int64},1}), 2, 3) with eltype Int64:
 1  2  3
 2  4  6

Here, reinterpret(Int, ys) first yields a 1D array.

3 Likes

You’re right–this changed in v1.0.

1 Like

Late to the party but I like this construction:

julia> A = [(1, 2), (2, 3), (3, 4)]
3-element Array{Tuple{Int64,Int64},1}:
 (1, 2)
 (2, 3)
 (3, 4)

julia>  tmp = map(x -> getindex.(A, x), 1:2)
2-element Array{Array{Int64,1},1}:
 [1, 2, 3]
 [2, 3, 4]

julia> out = reduce(hcat, tmp)
3×2 Array{Int64,2}:
 1  2
 2  3
 3  4
2 Likes

To initialize an array of tuples from arrays can be done as follows:

function f(x,y,i)
x[i], y[i]
end
U1=[1,2,3,4]
U2=[5,6,7,8]
f.(U1,U2,1…)

julia> function f(x,y,i)
x[i], y[i]
end
f (generic function with 1 method)

julia> U1=[1,2,3,4]
4-element Vector{Int64}:
1
2
3
4

julia> U2=[5,6,7,8]
4-element Vector{Int64}:
5
6
7
8

julia> f.(U1,U2,1…)
4-element Vector{Tuple{Int64, Int64}}:
(1, 5)
(2, 6)
(3, 7)
(4, 8)