Reading DataFrames from JLD2 files

The results of my simulations are saved as DataFrames in JLD2 files:

struct Sim
    params::Dict
    adf::DataFrame
    model::ABM
end
# Running simulations and generating results...
...

sim = Sim(d, adf, model)

# convert struct to dict
sim2d = struct2dict(sim)

# save the dict to name, using DrWatson safesave
safesave(fname, sim2d)

Until tonight, everything was working fine, but now I get the following warning when I try to load the data:

julia> log_file = jldopen(fnamef)                                                                                                                                                       
...
โ”œโ”€๐Ÿ”ข params
โ”œโ”€๐Ÿ”ข adf
โ””โ”€๐Ÿ”ข model

julia> adf = log_file["adf"]
โ”Œ Warning: saved type DataFrame is missing field metadata in workspace type; reconstructing
โ”” @ JLD2 ~/.julia/packages/JLD2/HnW0g/src/data/reconstructing_datatypes.jl:164

DataFrames package is already loaded via the using command and DataFramedatatype is defined in the workspace. I am confused about the meaning of the warning, what could be the source of the problem and how to deal with it. I appreciate your comments and help.

I am working with Julia@1.8.2, JLD2@0.4.25 and DataFrames@1.4.1.

1 Like

Perhaps it is this change:

DataFrame is now a mutable struct and has three new fields 
metadata, colmetadata, and allnotemetadata; 
this change makes DataFrame objects serialized under earlier 
versions of DataFrames.jl incompatible with 
version 1.4 (#3055)

For anyone needing to convert DataFrame objects from 1.3 to 1.4 version (e.g. if you serialized your objects for short-term storage).

The easiest solution is to use Tables.columntable on DataFrame objects created under DataFrames.jl 1.3 version. Then serialize them. Next upgrade DataFrames.jl to 1.4. Deserialize the NamedTuple, and next transform it back to DataFrame.

3 Likes

Great! That was indeed the problem. Downgrading the DataFrames solved the issue. Now I need to convert all of the saved DataFrames to 1.4! :woozy_face:

1 Like

But there is no hurry. Just set the compat entry of DataFrames of your project to ~1.3 and make the data conversion whenever you have some spare timeโ€ฆ

2 Likes

Hi! Could you please elaborate more thoroughly what exactly should be done? What should be converted to what and how? And what and how should be serialized?

Perhaps convert the dataframes to Tables using the old dataframe package and serialize them as Tables?
Then, using the new DataFrames package read the Tables from the jld2 file and convert them into the new DataFrames.


Like this? But there is no such formatโ€ฆ

on old DataFrames.jl:

  1. to_save = Tables.columntable(old_dataframe)
  2. serialize to_save.

Then on new DataFrames.jl:

  1. deserialize to from_save
  2. new_dataframe = DataFrame(from_save)`
2 Likes

Hereโ€™s a direct way to load โ€œoldโ€ DataFrames with the new version.

Example:

  1. create file with old DataFrames:
pkg> activate --temp
pkg> add DataFrames#v1.3.6
pkg> add JLD2 DelimitedFiles
julia> using JLD2, DelimitedFiles, DataFrames
julia> data, header = readdlm(joinpath(dirname(pathof(DataFrames)),
                                              "..", "docs", "src", "assets", "iris.csv"),
                                     ',', header=true);

julia> iris_raw = DataFrame(data, vec(header))
julia> iris = identity.(iris_raw)
lia> jldsave("old_dataframes.jld2"; iris)

Loading this in a new session with new DataFrames:

pkg> activate --temp
pkg> add JLD2 DataFrames
julia> using JLD2, DataFrames
julia> function Base.convert(::Type{DataFrame}, nt::NamedTuple)
           return DataFrame(nt.columns, nt.colindex)
       end

julia> load("old_dataframes.jld2", "iris"; 
    typemap = Dict("DataFrames.DataFrame" => JLD2.Upgrade(DataFrame))
    )

150ร—5 DataFrame
 Row โ”‚ SepalLength  SepalWidth  PetalLength  PetalWidth  Species        
     โ”‚ Float64      Float64     Float64      Float64     SubStrinโ€ฆ      
โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
   1 โ”‚         5.1         3.5          1.4         0.2  Iris-setosa
   2 โ”‚         4.9         3.0          1.4         0.2  Iris-setosa
   3 โ”‚         4.7         3.2          1.3         0.2  Iris-setosa
   4 โ”‚         4.6         3.1          1.5         0.2  Iris-setosa
  โ‹ฎ  โ”‚      โ‹ฎ           โ‹ฎ            โ‹ฎ           โ‹ฎ             โ‹ฎ
 147 โ”‚         6.3         2.5          5.0         1.9  Iris-virginica
 148 โ”‚         6.5         3.0          5.2         2.0  Iris-virginica
 149 โ”‚         6.2         3.4          5.4         2.3  Iris-virginica
 150 โ”‚         5.9         3.0          5.1         1.8  Iris-virginica
                                                        142 rows omitted

(note that the constructor for the DataFrame I used here explicitly states that it shouldnโ€™t be used directlyโ€ฆ)

Whatโ€™s happening here is that the typemap argument tells JLD2 to watch out for stored structs
with the name "DataFrames.DataFrame" and should use a special reconstruction method, i.e.
load the fields as a named tuple and call convert to get a DataFrame.

1 Like

thanks, that works!

`to_save1 = Tables.columntable(evt_e)

file1 = File(format"JLD2", โ€œdata/evt_e.jld2โ€)

save(file1, โ€œ1โ€, to_save1)
`