djsegal
September 21, 2018, 7:36pm
1
Let’s say you have a struct Foo, that has 30 or so named fields:
struct Foo
a::Int
b::AbstractFloat
...
y::Bool
z::Int
end
Now you want to store it (without the use of JLD).
What is the go-to method for transforming a vector of these Foo objects into a table?
I’ve tried DataFrames
, DataTables
, DataData
, DataLife
, DataTaxes
, etc.
But it seems like this extremely simple use-case is never spelled out in laymanese
Could you expand? Do you want each Foo object to be a column in a DataFrame, with each row a field, or do you want a vector of Foo objects?
djsegal
September 21, 2018, 7:49pm
3
I would like to have:
each Foo object in the array as a row in the database
with the columns as the fields (that are assumed to have a fixed-byte representation)
edit: here’s a quick rundown of how it would look:
index
a
b
…
y
z
1
2
3.1
…
true
1
2
52
0.4
…
false
2
…
…
…
…
…
…
98
22
1.5
…
false
2
99
-1
5.7
…
false
1
Does this work?
function makeVec(x::Foo)
t = fieldnames(typeof(x))
[getproperty(x, field) for field in t]
end
df = DataFrame(a = Int[], b = Int[]... allocate the types and names)
for x in VectorOfFoos
push!(df, makeVec(x)
end
You could also do
df = DataFrame(a = [x.a for x in VectorOfFoos], b = [x.b for x in VectorOfFoos]...)
1 Like
Using (the yet unreleased StructArrays and the latest DataFrames) something like this is possible
julia> using DataFrames, StructArrays
julia> struct Foo
a::Int
b::Float64
end
julia> c = [Foo(1, 2.0), Foo(2, 4.0)]
2-element Array{Foo,1}:
Foo(1, 2.0)
Foo(2, 4.0)
julia> DataFrame(StructArray(c))
2×2 DataFrame
│ Row │ a │ b │
│ │ Int64 │ Float64 │
├─────┼───────┼─────────┤
│ 1 │ 1 │ 2.0 │
│ 2 │ 2 │ 4.0 │
Edit: Updated the answer since master DataFrames makes this easier.
5 Likes
y4lu
September 22, 2018, 1:01am
6
for shorter structs you might also be able do something like
reduce(vcat, map(x-> [x.a x.b], arrFoo))
Add: There might be a performance regression using mapreduce()
struct n; a; b; end;
p = fill(n(rand(1:10), rand()*10), 10^6);
@time mapreduce(x->[x.a x.b], vcat, p);
5.99 seconds
@time reduce(vcat, map(x-> [x.a x.b], p));
~0.85 sec