DataFrame construction from array of tuples

data

#1

Hi all. What’s the best way to convert an array of tuples into a DataFrame?

For example, I’d like to construct a data frame with column names :a and :b for the data below:

julia> data = [(1,2),(4,5)]
2-element Array{Tuple{Int64,Int64},1}:
 (1, 2)
 (4, 5)

#2

df = DataFrame(a = [x[1] for x in data], b = [x[2] for x in data]) would work

function tupe2mat(data)
       n = length(data[1])
       m = ones(length(data), n)
       for i in 1:length(data)
            m[i,:] = [t for t in data[i]]
       end
       return m
end

#3
df = DataFrame(map(idx -> getindex.(data, idx), eachindex(data)), [:a, :b])

#4

Thanks. That’s what I thought as well but I felt it’s somewhat inefficient as it has to go through a loop for each comprehension. The real case is that I have 100k rows and many columns. Further, having to type x[1] for x in data, x[2] for x in data, etc. is a bit tiresome…


#5

Hi,

I couldn’t get this to work with a more elaborate example:

julia> data = [("a",1,2),("b",4,5),("c",6,7),("d",8,9)]
4-element Array{Tuple{String,Int64,Int64},1}:
 ("a", 1, 2)
 ("b", 4, 5)
 ("c", 6, 7)
 ("d", 8, 9)

julia> DataFrame(map(idx -> getindex.(data, idx), eachindex(data)), [:a, :b, :c])
ERROR: BoundsError: attempt to access ("a", 1, 2)
  at index [4]
Stacktrace:
 [1] getindex(::Tuple{String,Int64,Int64}, ::Int64) at ./tuple.jl:21
 [2] broadcast_t(::Function, ::Type{Any}, ::Tuple{Base.OneTo{Int64}}, ::CartesianRange{CartesianIndex{1}}, ::Array{Tuple{String,Int64,Int64},1}, ::Int64) at ./broadcast.jl:258
 [3] broadcast_c at ./broadcast.jl:321 [inlined]
 [4] broadcast(::Function, ::Array{Tuple{String,Int64,Int64},1}, ::Int64) at ./broadcast.jl:455
 [5] collect_to!(::Array{Array{T,1} where T,1}, ::Base.Generator{Base.OneTo{Int64},##19#20}, ::Int64, ::Int64) at ./array.jl:508
 [6] collect_to!(::Array{Array{String,1},1}, ::Base.Generator{Base.OneTo{Int64},##19#20}, ::Int64, ::Int64) at ./array.jl:518
 [7] _collect(::Base.OneTo{Int64}, ::Base.Generator{Base.OneTo{Int64},##19#20}, ::Base.EltypeUnknown, ::Base.HasShape) at ./array.jl:489
 [8] map(::Function, ::Base.OneTo{Int64}) at ./abstractarray.jl:1868


#6

My bad, it should be,
df = DataFrame(map(idx -> getindex.(data, idx), eachindex(first(data))), Names).
The length is given by the size of the elements in data not the length of data itself.