Reading DataFrames from JLD2 files

ehsani63 · October 10, 2022, 11:25pm

The results of my simulations are saved as DataFrames in JLD2 files:

struct Sim
    params::Dict
    adf::DataFrame
    model::ABM
end
# Running simulations and generating results...
...

sim = Sim(d, adf, model)

# convert struct to dict
sim2d = struct2dict(sim)

# save the dict to name, using DrWatson safesave
safesave(fname, sim2d)

Until tonight, everything was working fine, but now I get the following warning when I try to load the data:

julia> log_file = jldopen(fnamef)                                                                                                                                                       
...
├─🔢 params
├─🔢 adf
└─🔢 model

julia> adf = log_file["adf"]
┌ Warning: saved type DataFrame is missing field metadata in workspace type; reconstructing
└ @ JLD2 ~/.julia/packages/JLD2/HnW0g/src/data/reconstructing_datatypes.jl:164

DataFrames package is already loaded via the using command and DataFramedatatype is defined in the workspace. I am confused about the meaning of the warning, what could be the source of the problem and how to deal with it. I appreciate your comments and help.

I am working with Julia@1.8.2, JLD2@0.4.25 and DataFrames@1.4.1.

ufechner7 · October 10, 2022, 11:29pm

Perhaps it is this change:

DataFrame is now a mutable struct and has three new fields 
metadata, colmetadata, and allnotemetadata; 
this change makes DataFrame objects serialized under earlier 
versions of DataFrames.jl incompatible with 
version 1.4 (#3055)

For anyone needing to convert DataFrame objects from 1.3 to 1.4 version (e.g. if you serialized your objects for short-term storage).

The easiest solution is to use Tables.columntable on DataFrame objects created under DataFrames.jl 1.3 version. Then serialize them. Next upgrade DataFrames.jl to 1.4. Deserialize the NamedTuple, and next transform it back to DataFrame.

ehsani63 · October 10, 2022, 11:38pm

Great! That was indeed the problem. Downgrading the DataFrames solved the issue. Now I need to convert all of the saved DataFrames to 1.4!

ufechner7 · October 10, 2022, 11:41pm

But there is no hurry. Just set the compat entry of DataFrames of your project to ~1.3 and make the data conversion whenever you have some spare time…

vladdez · January 12, 2023, 12:59pm

Hi! Could you please elaborate more thoroughly what exactly should be done? What should be converted to what and how? And what and how should be serialized?

ufechner7 · January 12, 2023, 1:04pm

Perhaps convert the dataframes to Tables using the old dataframe package and serialize them as Tables?
Then, using the new DataFrames package read the Tables from the jld2 file and convert them into the new DataFrames.

vladdez · January 12, 2023, 2:16pm

Like this? But there is no such format…

bkamins · January 12, 2023, 2:27pm

on old DataFrames.jl:

to_save = Tables.columntable(old_dataframe)
serialize to_save.

Then on new DataFrames.jl:

deserialize to from_save
new_dataframe = DataFrame(from_save)`

JonasIsensee · January 13, 2023, 2:33pm

Here’s a direct way to load “old” DataFrames with the new version.

Example:

create file with old DataFrames:

pkg> activate --temp
pkg> add DataFrames#v1.3.6
pkg> add JLD2 DelimitedFiles
julia> using JLD2, DelimitedFiles, DataFrames
julia> data, header = readdlm(joinpath(dirname(pathof(DataFrames)),
                                              "..", "docs", "src", "assets", "iris.csv"),
                                     ',', header=true);

julia> iris_raw = DataFrame(data, vec(header))
julia> iris = identity.(iris_raw)
lia> jldsave("old_dataframes.jld2"; iris)

Loading this in a new session with new DataFrames:

pkg> activate --temp
pkg> add JLD2 DataFrames
julia> using JLD2, DataFrames
julia> function Base.convert(::Type{DataFrame}, nt::NamedTuple)
           return DataFrame(nt.columns, nt.colindex)
       end

julia> load("old_dataframes.jld2", "iris"; 
    typemap = Dict("DataFrames.DataFrame" => JLD2.Upgrade(DataFrame))
    )

150×5 DataFrame
 Row │ SepalLength  SepalWidth  PetalLength  PetalWidth  Species        
     │ Float64      Float64     Float64      Float64     SubStrin…      
─────┼──────────────────────────────────────────────────────────────────
   1 │         5.1         3.5          1.4         0.2  Iris-setosa
   2 │         4.9         3.0          1.4         0.2  Iris-setosa
   3 │         4.7         3.2          1.3         0.2  Iris-setosa
   4 │         4.6         3.1          1.5         0.2  Iris-setosa
  ⋮  │      ⋮           ⋮            ⋮           ⋮             ⋮
 147 │         6.3         2.5          5.0         1.9  Iris-virginica
 148 │         6.5         3.0          5.2         2.0  Iris-virginica
 149 │         6.2         3.4          5.4         2.3  Iris-virginica
 150 │         5.9         3.0          5.1         1.8  Iris-virginica
                                                        142 rows omitted

(note that the constructor for the DataFrame I used here explicitly states that it shouldn’t be used directly…)

What’s happening here is that the typemap argument tells JLD2 to watch out for stored structs
with the name "DataFrames.DataFrame" and should use a special reconstruction method, i.e.
load the fields as a named tuple and call convert to get a DataFrame.

vladdez · January 16, 2023, 12:58pm

thanks, that works!

`to_save1 = Tables.columntable(evt_e)

file1 = File(format"JLD2", “data/evt_e.jld2”)

save(file1, “1”, to_save1)
`

Topic		Replies	Views
Convert DataFrames 1.3 DataFrame to DataFrames 1.4 General Usage dataframes	14	497	November 2, 2022
Question about JLD2 save/load of RData nested list General Usage question	3	728	June 25, 2019
Save and restore DataFrame, and serialize()/deserialize() General Usage	13	5378	September 13, 2019
JLD load error for Dict General Usage question , jld	10	1865	November 26, 2020
[ANN]: JDF.jl v0.2.3 - DataFrames serialization format for Julia Package Announcements	0	569	November 3, 2019

Reading DataFrames from JLD2 files

Related topics