DataFrame from array of arrays

mthelm85 · December 23, 2018, 3:08pm

Hello, all. I’m trying to create a DataFrame from an array of arrays that is returned from an API, and I’m having a heck of a time getting this coded! The goal is to not have to hard-code the column names, as these may change depending on what parameters are used when calling the API.

The array that I am trying to convert to a DataFrame looks like this:

78-element Array{Any,1}:
 Any["Emp", "year", "quarter", "sex", "agegrp", "ownercode", "firmsize", "seasonadj", "industry", "state", "county"]
 Any["3410", "2017", "3", "0", "A00", "A05", "0", "U", "00", "40", "001"]                                           
 Any["915", "2017", "3", "0", "A00", "A05", "0", "U", "00", "40", "003"]                                                                                  
 ⋮                                                                                                                  
 Any["23884", "2017", "3", "0", "A00", "A05", "0", "U", "00", "40", "131"]                                          
 Any["5099", "2017", "3", "0", "A00", "A05", "0", "U", "00", "40", "133"]

If I hard-code the column names, I can create the DataFrame like this:

df = DataFrame(Emp=String[], year=String[], quarter=String[], sex=String[], agegrp=String[], ownercode=String[], firmsize=String[], seasonadj=String[], industry=String[], state=String[], county=String[])
for i = 2 : length(employment_data)
    push!(df.Emp, employment_data[i][1])
    push!(df.year, employment_data[i][2])
    push!(df.quarter, employment_data[i][3])
    push!(df.sex, employment_data[i][4])
    push!(df.agegrp, employment_data[i][5])
    push!(df.ownercode, employment_data[i][6])
    push!(df.firmsize, employment_data[i][7])
    push!(df.seasonadj, employment_data[i][8])
    push!(df.industry, employment_data[i][9])
    push!(df.state, employment_data[i][10])
    push!(df.county, employment_data[i][11])
end

Aside from being ridiculously verbose, the column names here can’t be changed. It seems like there should be an easy way to loop through the first array (as it contains the column names), create a DataFrame with these column names, and then push the rest of the arrays to the DataFrame, but I cannot achieve a working solution.

pdeffebach · December 23, 2018, 3:26pm

I would probably use a generator for this. Say x is the name of your array.

namelist = Symbol.(x[1])
df  = DataFrame()

for (i, name) in enumerate(namelist)
    df[name] =  [x[j][i] for j in 2:length(x)]
end

piever · December 23, 2018, 3:31pm

There also is a constructor DataFrame(columns, names), so for example:

julia> columns = Any[rand(10), rand(10)];

julia> DataFrame(columns, [:col1, :col2])
10×2 DataFrame
│ Row │ col1     │ col2     │
│     │ Float64  │ Float64  │
├─────┼──────────┼──────────┤
│ 1   │ 0.620302 │ 0.763272 │
│ 2   │ 0.591029 │ 0.335824 │
│ 3   │ 0.684387 │ 0.24118  │
│ 4   │ 0.282933 │ 0.542262 │
│ 5   │ 0.942279 │ 0.185193 │
│ 6   │ 0.35253  │ 0.500711 │
│ 7   │ 0.74824  │ 0.49447  │
│ 8   │ 0.102255 │ 0.660015 │
│ 9   │ 0.485545 │ 0.897344 │
│ 10  │ 0.12191  │ 0.43754  │

EDIT: nevermind, I thought you had the columns rather than the rows so the above makes little sense.

pdeffebach · December 23, 2018, 3:36pm

Each array is a row here, though. So OP would need to do some reshaping to get it to work.

mthelm85 · December 24, 2018, 12:25am

This is brilliant, thank you. For future readers who may also be new to Julia (like me), I’ve added some comments to your code which I think explain what is happening (please correct if I’m wrong!):

#= convert each item in array x[1] to a Symbol by broadcasting Symbol() across the array
with dot syntax =#
namelist = Symbol.(x[1])

# construct an empty DataFrame
df  = DataFrame()

#= loop through the namelist array, create a column in the DataFrame entitled namelist[i]
and assign its values by using an array comprehension to build an array with the
appropriate values, starting at the second array in array x=# 
for (i, name) in enumerate(namelist)
    df[name] =  [x[j][i] for j in 2:length(x)]
end

Topic		Replies	Views
I have an array of 31 arrays and would like to make it a DataFrame. Need some help General Usage	4	282	April 6, 2020
DataFrames: how to convert many arrays into a data frame? General Usage dataframes , arrays	3	489	January 3, 2022
DataFrame construction from array of tuples General Usage data	12	7110	November 28, 2022
Array of tuples to DataFrame General Usage question , package , dataframes	1	48	September 17, 2024
Convert Array to DataFrame General Usage dataframes	11	1596	November 13, 2022

DataFrame from array of arrays

Related topics