Array of tuples

jcrbloch · June 5, 2018, 7:47am

I have an array A of tuples Array{TupleFloat64,Float64},1}, but am not able to make two arrays out of this like:
I1 = A[:][1]
I2 = A[:][2
even though I can access the individual elements with A[i][j]

dpsanders · June 5, 2018, 7:49am

Please give a minimal working example in the future (in this case, a small example case of your array A. Also please quote your code with backticks.

One solution is

I1 = first.(A)
I2 = last.(A)

Another is

I1 = [x[1] for x in A]

jcrbloch · June 5, 2018, 10:04am

Dear David,

Thank you for providing a solution. I am an experienced C and C++ programmer but a newcomer to Julia. Is there a deeper reason why A[:][1] does not work to make an array with the first element of the tupple?

Kind regards,

Jacques

GunnarFarneback · June 5, 2018, 12:29pm

A[:] is the same as A for a vector, regardless what is inside, indexing out all elements.

julia> A=[(1, 2), (2, 3), (3, 4)]
3-element Array{Tuple{Int64,Int64},1}:
 (1, 2)
 (2, 3)
 (3, 4)

julia> A[:]
3-element Array{Tuple{Int64,Int64},1}:
 (1, 2)
 (2, 3)
 (3, 4)

so A[:][1] is the same as A[1].

If you have a two-dimensional array you can extract columns with colon indexing though.

julia> A = [1 2;2 3;3 4]
3×2 Array{Int64,2}:
 1  2
 2  3
 3  4

julia> A[:, 1]
3-element Array{Int64,1}:
 1
 2
 3

jcrbloch · June 5, 2018, 12:50pm

Dear Gunar,

Thanks for the additional info. What confused is that, as you say, the colon indexing can be used to extract columns in a two-dimensional array, A[:,1], so naively I thought/hoped that A[:][1] would do something similar for an array of tuples. I still have problems really grasping the interplay (or non-interplay) of tuples and arrays. The original cause of my problem is that a function returning multiple values automatically returns them in a tuple. If they would be returned in an array, nothing of this would happen.

Thank,

Jacques

GunnarFarneback · June 5, 2018, 1:10pm

A missing piece of information might be that repeated brackets isn’t a special syntax but just repeated indexing, as witnessed by quoting it.

julia> :(A[i][j])
:((A[i])[j])

If you have control of the function you can choose to package your multiple values in a vector before returning them.

julia> f(i) = [i, i + 1]
f (generic function with 1 method)

julia> A = f.([1, 2, 3])
3-element Array{Array{Int64,1},1}:
 [1, 2]
 [2, 3]
 [3, 4]

This doesn’t in itself make it easier to slice out a part than with a vector of tuples but makes it a little more convenient to repackage it in an array, that can be sliced.

julia> B = reduce(hcat, A)
2×3 Array{Int64,2}:
 1  2  3
 2  3  4

julia> B[1, :]
3-element Array{Int64,1}:
 1
 2
 3

However, you are probably better off with some kind of dot vectorization or comprehension solution like David proposed.

rdeits · June 5, 2018, 1:32pm

Returning an array would be much more expensive than returning a tuple, as an array has a significant amount of overhead compared to a tuple (which can be exactly zero-overhead in many circumstances), which is why it’s not done. Also, I’m not convinced that that would help in your case anyway, as a vector-of-vectors is still a completely different structure than a 2D matrix, and something like A[:][1] would not work even if each element of A were a vector.

Maybe you can describe more about what the actual problem you’re trying to solve is?

jcrbloch · June 5, 2018, 1:47pm

Dear Robin,

Thanks for taking the time to replying to this. I understand your point about an array of arrays not being a matrix. The actual problem is that I have a function f(s) computing various measurements for a specific parameter value s. I am calling this function on an array of parameter values using f.(s_array). What I want to end up with is to have several arrays of measurements, which will effectively hold m1(s_array), m2(s_array), … where m1,m2 are my measurements.

What I could is to define a struct I guess, but I wanted to avoid this complication. Or as Gunnar just wrote, I could also put the measurement in an array in the function and then return the array. In the latter case I will end up with an array of arrays, while in the former case I would end up with an array of structs… not sure if that is any better. At the end of the day I want to plot each measurement versus the parameter values. Seems like a very standard problem.

Cheers,

Jacques

rdeits · June 5, 2018, 1:56pm

I see, thanks. One easy solution is what @dpsanders suggested above and using a comprehension:

julia> function f(x)
         x, 2x
       end
f (generic function with 1 method)

julia> xs = [1,2,3]
3-element Array{Int64,1}:
 1
 2
 3

julia> ys = f.(xs)
3-element Array{Tuple{Int64,Int64},1}:
 (1, 2)
 (2, 4)
 (3, 6)

julia> [y[1] for y in ys]
3-element Array{Int64,1}:
 1
 2
 3

julia> [y[2] for y in ys]
3-element Array{Int64,1}:
 2
 4
 6

this should be pretty efficient and easy to generalize. If you don’t want to use the comprehension syntax, you can also broadcast the getindex function (that’s the function that’s called when you do foo[bar]):

julia> getindex.(ys, 1)
3-element Array{Int64,1}:
 1
 2
 3

julia> getindex.(ys, 2)
3-element Array{Int64,1}:
 2
 4
 6

If you want to get really fancy, there’s one more trick you can try, as long as all of your returned values are of the same type and that type is isbits (so, a primitive immutable type like Float64 or a struct or tuple made up of primitive immutables). If all of your parameter values are Float64 or Int, for example, then this will work. All we have to do is rely on the fact that isbits types are stored inline in arrays, and we can freely transform an array-of-tuples into a matrix with reinterpret():

julia> r = reinterpret(Int, ys, (2, 3))
2×3 Array{Int64,2}:
 1  2  3
 2  4  6

julia> r[1, :]
3-element Array{Int64,1}:
 1
 2
 3

julia> r[2, :]
3-element Array{Int64,1}:
 2
 4
 6

Edit: be warned that it’s easy to forget the column-major layout order and get the reinterpret size wrong. I had to edit this post because I got it wrong myself!

Which of these is appropriate will depend on your particular needs, and I suggest using GitHub - JuliaCI/BenchmarkTools.jl: A benchmarking framework for the Julia language to determine which performs best for your case.

jcrbloch · June 5, 2018, 2:06pm

Dear Robin,

Thanks for the detailed explanation. To me it seems that the getindex function gets really close to what I was originally looking for!

The reinterpret won’t work in this case because the return values are of different types (floats and complex).

Cheers,

Jacques

DrPapa · November 16, 2018, 3:14pm

This is extremely helpful. Alternatively, is there an efficient way to do this using view? I put together the following benchmark for comprehension vs. view vs. getindex and was surprised by the results. I guess I expected view to be faster for large arrays. Am I missing something, or is there an good reason for view being slow? Why are there so many allocations?

using BenchmarkTools

N = 10
data = [rand(3) for i ∈ 1:N]

@btime [q[2] for q in $data]
@btime getindex.($data, 2)
@btime view.($data,2)

returns

57.476 ns (2 allocations: 176 bytes)
48.976 ns (1 allocation: 160 bytes)
108.226 ns (11 allocations: 640 bytes)

For N = 1000000, I get:

7.812 ms (3 allocations: 7.63 MiB)
7.815 ms (2 allocations: 7.63 MiB)
35.088 ms (1000002 allocations: 53.41 MiB)

rdeits · November 16, 2018, 6:14pm

Don’t worry about the vector-of-tuples part of this, just look at what each scalar operation you’re doing is. getindex.(data, 2) calls getindex(d, 2) for each d in data, while view.(data, 2) calls view(d, 2) for each d.

In your case, d is a 3-element vector, so getindex(d, 2) returns the second index, which is just a Float64. That’s an extremely cheap operation. On the other hand, view(d, 2) actually constructs a View representing the second element of d. While a View may be cheaper to allocate than a new Array (that’s the whole point, after all), they’re not cheaper to allocate than just a single Float64.

On the other hand, if you did something like:

data = [rand(10000) for i in 1:N]
@btime getindex.($data, Ref(1:1000))
@btime view.($data, Ref(1:1000))

then you might see view coming out ahead, since it will avoid creating a 1000-element copy of each element of data.

DrPapa · November 16, 2018, 6:24pm

This makes sense. Correct me if I’m wrong, but in my usage of view.() I was essentially creating an array of N views vs a view to an N length Array.

Seif_Shebl · November 16, 2018, 6:43pm

Also broadcasting works fine:

julia> A = [(1, 2), (2, 3), (3, 4)];

julia> I1 = (x->x[1]).(A)
3-element Array{Int64,1}:
 1
 2
 3

julia> I2 = (x->x[2]).(A)
3-element Array{Int64,1}:
 2
 3
 4

Shuhua · August 21, 2020, 12:02pm

It seems the API has changed in v1.5 (or earlier versions). Now reinterpret no longer supports the third argument. We have to combine reinterpret and reshape to achieve the same purpose.

julia> r = reshape(reinterpret(Int, ys), (2, 3))
2×3 reshape(reinterpret(Int64, ::Array{Tuple{Int64,Int64},1}), 2, 3) with eltype Int64:
 1  2  3
 2  4  6

Here, reinterpret(Int, ys) first yields a 1D array.

rdeits · August 21, 2020, 1:31pm

You’re right–this changed in v1.0.

afishy · August 28, 2020, 6:08am

Late to the party but I like this construction:

julia> A = [(1, 2), (2, 3), (3, 4)]
3-element Array{Tuple{Int64,Int64},1}:
 (1, 2)
 (2, 3)
 (3, 4)

julia>  tmp = map(x -> getindex.(A, x), 1:2)
2-element Array{Array{Int64,1},1}:
 [1, 2, 3]
 [2, 3, 4]

julia> out = reduce(hcat, tmp)
3×2 Array{Int64,2}:
 1  2
 2  3
 3  4

herm · June 20, 2024, 9:39am

To initialize an array of tuples from arrays can be done as follows:

function f(x,y,i)
x[i], y[i]
end
U1=[1,2,3,4]
U2=[5,6,7,8]
f.(U1,U2,1…)

julia> function f(x,y,i)
x[i], y[i]
end
f (generic function with 1 method)

julia> U1=[1,2,3,4]
4-element Vector{Int64}:
1
2
3
4

julia> U2=[5,6,7,8]
4-element Vector{Int64}:
5
6
7
8

julia> f.(U1,U2,1…)
4-element Vector{Tuple{Int64, Int64}}:
(1, 5)
(2, 6)
(3, 7)
(4, 8)