What is easiest way to transform an array of objects into a data table?

Let’s say you have a struct Foo, that has 30 or so named fields:

struct Foo
  a::Int  
  b::AbstractFloat 
  ...
  y::Bool 
  z::Int 
end

Now you want to store it (without the use of JLD).

What is the go-to method for transforming a vector of these Foo objects into a table?


I’ve tried DataFrames, DataTables, DataData, DataLife, DataTaxes, etc.

But it seems like this extremely simple use-case is never spelled out in laymanese

Could you expand? Do you want each Foo object to be a column in a DataFrame, with each row a field, or do you want a vector of Foo objects?

I would like to have:

  • each Foo object in the array as a row in the database
  • with the columns as the fields (that are assumed to have a fixed-byte representation)

edit: here’s a quick rundown of how it would look:

index a b y z
1 2 3.1 true 1
2 52 0.4 false 2
98 22 1.5 false 2
99 -1 5.7 false 1

Does this work?

function makeVec(x::Foo)
    t = fieldnames(typeof(x))
    [getproperty(x, field) for field in t]
end

df = DataFrame(a = Int[], b = Int[]... allocate the types and names)

for x in VectorOfFoos
    push!(df, makeVec(x)
end

You could also do

df = DataFrame(a = [x.a for x in VectorOfFoos], b = [x.b for x in VectorOfFoos]...)
1 Like

Using (the yet unreleased StructArrays and the latest DataFrames) something like this is possible

julia> using DataFrames, StructArrays

julia> struct Foo
         a::Int
         b::Float64
       end

julia> c = [Foo(1, 2.0), Foo(2, 4.0)]
2-element Array{Foo,1}:
 Foo(1, 2.0)
 Foo(2, 4.0)

julia> DataFrame(StructArray(c))
2×2 DataFrame
│ Row │ a     │ b       │
│     │ Int64 │ Float64 │
├─────┼───────┼─────────┤
│ 1   │ 1     │ 2.0     │
│ 2   │ 2     │ 4.0     │

Edit: Updated the answer since master DataFrames makes this easier.

5 Likes

for shorter structs you might also be able do something like
reduce(vcat, map(x-> [x.a x.b], arrFoo))

Add: There might be a performance regression using mapreduce()

struct n; a; b; end;
p = fill(n(rand(1:10), rand()*10), 10^6);
@time mapreduce(x->[x.a x.b], vcat, p);
5.99 seconds
@time reduce(vcat, map(x-> [x.a x.b], p));
~0.85 sec