Outputing/Inputing vectors in DataFrames

mb96 · January 17, 2023, 2:47pm

Hi,

After some calculations I define a DataFrame to store my results and output them as a CSV file.

The cutoffs columns are vectors. After outputting them as a CSV file I would like to read the file again an access the vectors as vectors but they are stored as Strings.

Is there a way to tell DataFrames to store them as vectors such that when I read them I can access them as vectors? Or, is there a way to read in the output as string and convert them into vectors?

Thanks a lot,
M.

nilshg · January 17, 2023, 3:02pm

This is a very old question to which the answer is no: if you want to store additional structure with your data, don’t use csv.

See e.g. this prior discussion:

bkamins · January 17, 2023, 3:03pm

If you want to use CSV as a storage format then I would not advise you to try what you do (however, you could save this data e.g. as Parquet2.jl file or Arrow.jl file).

In CSV what you should do is the following (as CSV does not support storing vectors of vectors):

julia> df = DataFrame(x=rand(3), y=[rand(4) for i in 1:3])
3×2 DataFrame
 Row │ x         y
     │ Float64   Array…
─────┼─────────────────────────────────────────────
   1 │ 0.493174  [0.839034, 0.0397448, 0.766962, …
   2 │ 0.146369  [0.398974, 0.416861, 0.444297, 0…
   3 │ 0.729989  [0.33982, 0.629735, 0.313111, 0.…

julia> df.id = axes(df, 1) # add id column
Base.OneTo(3)

julia> df_to_save = flatten(df, :y) # flatten the data so that you can write it
12×3 DataFrame
 Row │ x         y          id
     │ Float64   Float64    Int64
─────┼────────────────────────────
   1 │ 0.493174  0.839034       1
   2 │ 0.493174  0.0397448      1
   3 │ 0.493174  0.766962       1
   4 │ 0.493174  0.955614       1
   5 │ 0.146369  0.398974       2
   6 │ 0.146369  0.416861       2
   7 │ 0.146369  0.444297       2
   8 │ 0.146369  0.765362       2
   9 │ 0.729989  0.33982        3
  10 │ 0.729989  0.629735       3
  11 │ 0.729989  0.313111       3
  12 │ 0.729989  0.220889       3

julia> CSV.write("test.csv", df_to_save) # save it
"test.csv"

julia> df_loaded = CSV.read("test.csv", DataFrame) # load it
12×3 DataFrame
 Row │ x         y          id
     │ Float64   Float64    Int64
─────┼────────────────────────────
   1 │ 0.493174  0.839034       1
   2 │ 0.493174  0.0397448      1
   3 │ 0.493174  0.766962       1
   4 │ 0.493174  0.955614       1
   5 │ 0.146369  0.398974       2
   6 │ 0.146369  0.416861       2
   7 │ 0.146369  0.444297       2
   8 │ 0.146369  0.765362       2
   9 │ 0.729989  0.33982        3
  10 │ 0.729989  0.629735       3
  11 │ 0.729989  0.313111       3
  12 │ 0.729989  0.220889       3

julia> combine(groupby(df_loaded, :id), :x => first, :y => Ref∘copy, keepkeys=false) # reverse the flattening - you might not need it, but in case you do this is how you can do it
3×2 DataFrame
 Row │ x_first   y_Ref_copy
     │ Float64   Array…
─────┼─────────────────────────────────────────────
   1 │ 0.493174  [0.839034, 0.0397448, 0.766962, …
   2 │ 0.146369  [0.398974, 0.416861, 0.444297, 0…
   3 │ 0.729989  [0.33982, 0.629735, 0.313111, 0.…

Topic		Replies	Views
Save Dataframe in file and read it again General Usage question	4	3811	May 28, 2020
DataFrames/CSV: how to read vectors from *.csv? General Usage	9	2851	March 26, 2021
Issues reading CSV file with array elements General Usage dataframes , csv	4	1780	September 6, 2021
DataFrames: reading vector from *.csv file to dataframe column General Usage	2	1373	October 3, 2019
Saving a DataFrame containing vectors Data question	4	299	May 27, 2023

Outputing/Inputing vectors in DataFrames

Related topics