Outputing/Inputing vectors in DataFrames

Hi,

After some calculations I define a DataFrame to store my results and output them as a CSV file.

The cutoffs columns are vectors. After outputting them as a CSV file I would like to read the file again an access the vectors as vectors but they are stored as Strings.

Is there a way to tell DataFrames to store them as vectors such that when I read them I can access them as vectors? Or, is there a way to read in the output as string and convert them into vectors?

Thanks a lot,
M.

This is a very old question to which the answer is no: if you want to store additional structure with your data, don’t use csv.

See e.g. this prior discussion:

2 Likes

If you want to use CSV as a storage format then I would not advise you to try what you do (however, you could save this data e.g. as Parquet2.jl file or Arrow.jl file).

In CSV what you should do is the following (as CSV does not support storing vectors of vectors):

julia> df = DataFrame(x=rand(3), y=[rand(4) for i in 1:3])
3Γ—2 DataFrame
 Row β”‚ x         y
     β”‚ Float64   Array…
─────┼─────────────────────────────────────────────
   1 β”‚ 0.493174  [0.839034, 0.0397448, 0.766962, …
   2 β”‚ 0.146369  [0.398974, 0.416861, 0.444297, 0…
   3 β”‚ 0.729989  [0.33982, 0.629735, 0.313111, 0.…

julia> df.id = axes(df, 1) # add id column
Base.OneTo(3)

julia> df_to_save = flatten(df, :y) # flatten the data so that you can write it
12Γ—3 DataFrame
 Row β”‚ x         y          id
     β”‚ Float64   Float64    Int64
─────┼────────────────────────────
   1 β”‚ 0.493174  0.839034       1
   2 β”‚ 0.493174  0.0397448      1
   3 β”‚ 0.493174  0.766962       1
   4 β”‚ 0.493174  0.955614       1
   5 β”‚ 0.146369  0.398974       2
   6 β”‚ 0.146369  0.416861       2
   7 β”‚ 0.146369  0.444297       2
   8 β”‚ 0.146369  0.765362       2
   9 β”‚ 0.729989  0.33982        3
  10 β”‚ 0.729989  0.629735       3
  11 β”‚ 0.729989  0.313111       3
  12 β”‚ 0.729989  0.220889       3

julia> CSV.write("test.csv", df_to_save) # save it
"test.csv"

julia> df_loaded = CSV.read("test.csv", DataFrame) # load it
12Γ—3 DataFrame
 Row β”‚ x         y          id
     β”‚ Float64   Float64    Int64
─────┼────────────────────────────
   1 β”‚ 0.493174  0.839034       1
   2 β”‚ 0.493174  0.0397448      1
   3 β”‚ 0.493174  0.766962       1
   4 β”‚ 0.493174  0.955614       1
   5 β”‚ 0.146369  0.398974       2
   6 β”‚ 0.146369  0.416861       2
   7 β”‚ 0.146369  0.444297       2
   8 β”‚ 0.146369  0.765362       2
   9 β”‚ 0.729989  0.33982        3
  10 β”‚ 0.729989  0.629735       3
  11 β”‚ 0.729989  0.313111       3
  12 β”‚ 0.729989  0.220889       3

julia> combine(groupby(df_loaded, :id), :x => first, :y => Ref∘copy, keepkeys=false) # reverse the flattening - you might not need it, but in case you do this is how you can do it
3Γ—2 DataFrame
 Row β”‚ x_first   y_Ref_copy
     β”‚ Float64   Array…
─────┼─────────────────────────────────────────────
   1 β”‚ 0.493174  [0.839034, 0.0397448, 0.766962, …
   2 β”‚ 0.146369  [0.398974, 0.416861, 0.444297, 0…
   3 β”‚ 0.729989  [0.33982, 0.629735, 0.313111, 0.…
4 Likes