Hi, I want to convert following struct to DataFrames:
struct MyStruct
x::Vector{Float64}
y::Vector{Float64}
end
I have written following function to convert MyStruct fields and their values to DataFrame:
function struct_to_dataframe(s::MyStruct)
# Create an empty data frame
df = DataFrame()
fields = fieldnames(s)
# Get the values for each field
values = [getfield(s,field) for field in fields]
push!(df, (field=fields, value=values))
return df
end
#Create instance
s = MyStruct(rand(10),rand(10))
#Convert to DF
df = struct_to_dataframe(s)
But the outcome is something like this:
1×2 DataFrame
Row │ field value
│ Tuple… Array…
─────┼─────────────────────────────────────────────
1 │ (:x, :y) [[0.976849, 0.653888, 0.956756, …
Whereas I want each field and its value in column form. How can I do that? Thanks!!!
Maybe you had a mistake at fieldnames
. It require the struct inself, not instance. Use typeof
function to get the name of class(structure).
julia> struct MyStruct
x::Vector{Float64}
y::Vector{Float64}
end
julia> function struct_to_dataframe(s)
fields = fieldnames(typeof(s))
still_vector = [getfield(s,field) for field in fields]
return DataFrame(hcat(still_vector...), collect(fields))
end
struct_to_dataframe (generic function with 1 method)
julia> s = MyStruct(rand(10),rand(10));
julia> struct_to_dataframe(s)
10×2 DataFrame
Row │ x y
│ Float64 Float64
─────┼──────────────────────
1 │ 0.286858 0.140575
2 │ 0.253701 0.821031
3 │ 0.767813 0.474758
4 │ 0.669886 0.74336
5 │ 0.463448 0.956407
6 │ 0.692433 0.715941
7 │ 0.520875 0.447658
8 │ 0.783861 0.00811924
9 │ 0.928598 0.885618
10 │ 0.681593 0.271318
@rmsmsgood thank you so much! Related to my question, I also want to ask if I set one of the field of MyStruct as an integer then I get a dimension mismatch error. How to take care of that?
Setting one of the filed of YourStruct as an integer, not an integer vector? You mean, this?
struct MyStruct
x::Vector{Float64}
y::Vector{Float64}
k::Int64
end
I guess that should be a dimension error because x,y,k have different length and hcat
makes them one matrix. If x,y has length 4, then the matrix has the form below
x[1] y[1] k[1]
x[2] y[2] 😡
x[3] y[3] 😡
x[4] y[4] 😡
so maybe hcat
raise dimension mismatch error since doesn’t exist. You can take care of this in many way.
- Make integer to integer vector.
repeat
function maybe fit for you. eg) k \mapsto [k, k, k, \cdots , k].
- Just ignore that. In the
for
loop, you could check the length of each field or type, so if that is not an array then skip that.
If you mean something like this, the DataFrame constructor can handle automatic broadcasting of scalar elements
struct MS
x::Vector{Float64}
y::Vector{Float64}
k::Int64
end
s = MS(rand(10),rand(10), rand(1:10))
vals=getfield.([s],1:fieldcount(typeof(s)))
using DataFrames
DataFrame(;zip(propertynames(s),vals)...)