I serialized a DataFrame from DataFrames.email@example.com. Can I convert it to be compatible with DataFrames.firstname.lastname@example.org so I can load it in an environment with the new version?
One option is to read it in and write it back out as an arrow file, then read it with newer DataFrames.
This dataframe has complicated column types that aren’t natively supported by Arrow, so I’d have to write a lot of extra code to make that work.
Arrow should support self-descriptive structs, they have a metadata thingy in it, as long as your struct definition can be loaded again, you can load the same arrow file again
Do the following:
- convert data frame using
- serialize produced
- update to DataFrames.jl 1.4
- deserialize produced
@bkamins is there a programmatic way to do this, if I have a couple of dozen JLD files based on DataFrames 1.3?
Under DataFrames.jl 1.3 for all files you have
filename.jldJLD file to
Now update to DataFrames.jl 1.4 and for all files you have:
df = DataFrame(deserialize("filename.bin"))
dfas JLD file
Thanks. Do I need to restart Julia after/before updating the DataFrames pkg? I think so.
EDIT: I note that I disabled Revise, as it gave me ton’s of complaints.
Yes, you need to restart Julia.
While this is your solution, that one (or converting to Arrow) seem very inconvenient, even with the programmatic way. Can’t it be done in DataFrame.jl 1.4 (or 1.5?) to do it for you when you open older (by default, or if problematic, as opt-in)? What’s done in other languages, e.g. pandas (or for pickle) do the just support saved data from older versions?
Along your idea: I keep thinking, that a smart person would be able to start up multiple julia sessions that talk to each other (most primitive variant would be through binary files on disk). One using DataFrames 1.3 and the other using DataFrames 1.4
What you ask for cannot be done in DataFrames.jl as it is not DataFrames.jl that “opens” the stored data. It is JLD.jl in this case that would need to add such an extension (which is not very likely I assume).
Indeed the process is a bit cumbersome, but it is a one-time action. Here are scripts that do all what is needed. I assume that you want to convert
"new.jld" (not tested - I have written if from my head)
mkdir old mkdir new cd old julia old.jl julia new.jl
using Pkg Pkg.activate("old") Pkg.add("DataFrames", version="1.3.6") Pkg.add("JLD", version="XXXX") # JLD version you used to save data using DataFrames using JLD using Serialization df = load("old.jld") serialize("df.bin", Tables.columntable(df))
using Pkg Pkg.activate("new") Pkg.add("DataFrames", version="1.4.2") Pkg.add("JLD", version="XXXX") # JLD version you want to use to save data with using DataFrames using JLD using Serialization df = DataFrame(deserialize("df.bin")) # or whatever name you want to use save("new.jld", "df")
you really should be using Arrow.jl and just get back DataFrame every time
I assume @bernhard uses JLD because of custom columns that Arrow.jl cannot represent properly.
OK, I see it wasn’t strictly a problem with DF, as I assumed from “I serialized a DataFrame from DataFrames.email@example.com”, that some DF function was used. It’s a one-time thing now, until maybe a later version. And even more important if it’s a problem with Julia’s (or JLD) serialization, to know if something better can be done with it, or to avoid the problem by not using it, and then what else?
I recall from Pandas, that Pandas has a function to store dataframes. I’m still curious if a problem in other languages. Should DF have such a function that would just work?