See GitHub - xiaodaigh/JDF.jl: Julia DataFrames serialization format
JDF is the Julia DataFrames serialization format. It’s a specialised serialization format and hence doesn’t support arbitrary objects like JLD2 and JLSO. However, for the formats it supports it typically results in smaller file size, faster save/load, and is less likely to have difficult to resolve bugs.
New functionality
On disk access to metadata
One can now refer to the JDF file on disk and obtain some metadata from it including column names, and nrow
and ncol
.
E.g.
using DataFrames, JDF
df = DataFrames(a = 1:3, b = 1:3, c = 1:3)
savejdf(df, "df.jdf")
df_ondisk = jdf"df.jdf"
names(df_ondisk)
nrow(df_ondisk)
nrow(df_ondisk)
Selective loading of columns
One can use the cols
argument in loadjdf
to load selected columns. For example, to load columns :a
and :c
, we do
df_cols_a_c = loadjdf("df.jdf", cols = [:a, :c])
eachcol
iterator directly from disk
Without loading the whole dataframe into RAM, JDF allows you to iterate through each column
for col in eachol(df_ondisk)
# col is the column
end
to iterate through the names and it’s corresponding column, we do
for (name, col) in zip(names(df_ondisk), eachol(df_ondisk))
# col is the column
end
New supported types
This version supports a few data types including
Char
-
TimeZones.ZonedDateTime
(experimental) Symbol
in addition to the types it already supports which include
WeakRefStrings.StringVector
-
Vector{T}
,Vector{Union{Mising, T}}
,Vector{Union{Nothing, T}}
CategoricalArrays.CategoricalVetors{T}
where T
can be String
, Bool
, Symbol
, Char
, TimeZones.ZonedDateTime
(experimental) and isbits
types i.e. UInt*
, Int*
, and Float*
Date*
types etc.