Reading DataFrames from JLD2 files

JonasIsensee · January 13, 2023, 2:33pm

Here’s a direct way to load “old” DataFrames with the new version.

Example:

create file with old DataFrames:

pkg> activate --temp
pkg> add DataFrames#v1.3.6
pkg> add JLD2 DelimitedFiles
julia> using JLD2, DelimitedFiles, DataFrames
julia> data, header = readdlm(joinpath(dirname(pathof(DataFrames)),
                                              "..", "docs", "src", "assets", "iris.csv"),
                                     ',', header=true);

julia> iris_raw = DataFrame(data, vec(header))
julia> iris = identity.(iris_raw)
lia> jldsave("old_dataframes.jld2"; iris)

Loading this in a new session with new DataFrames:

pkg> activate --temp
pkg> add JLD2 DataFrames
julia> using JLD2, DataFrames
julia> function Base.convert(::Type{DataFrame}, nt::NamedTuple)
           return DataFrame(nt.columns, nt.colindex)
       end

julia> load("old_dataframes.jld2", "iris"; 
    typemap = Dict("DataFrames.DataFrame" => JLD2.Upgrade(DataFrame))
    )

150×5 DataFrame
 Row │ SepalLength  SepalWidth  PetalLength  PetalWidth  Species        
     │ Float64      Float64     Float64      Float64     SubStrin…      
─────┼──────────────────────────────────────────────────────────────────
   1 │         5.1         3.5          1.4         0.2  Iris-setosa
   2 │         4.9         3.0          1.4         0.2  Iris-setosa
   3 │         4.7         3.2          1.3         0.2  Iris-setosa
   4 │         4.6         3.1          1.5         0.2  Iris-setosa
  ⋮  │      ⋮           ⋮            ⋮           ⋮             ⋮
 147 │         6.3         2.5          5.0         1.9  Iris-virginica
 148 │         6.5         3.0          5.2         2.0  Iris-virginica
 149 │         6.2         3.4          5.4         2.3  Iris-virginica
 150 │         5.9         3.0          5.1         1.8  Iris-virginica
                                                        142 rows omitted

(note that the constructor for the DataFrame I used here explicitly states that it shouldn’t be used directly…)

What’s happening here is that the typemap argument tells JLD2 to watch out for stored structs
with the name "DataFrames.DataFrame" and should use a special reconstruction method, i.e.
load the fields as a named tuple and call convert to get a DataFrame.