[ANN]: JDF.jl v0.2.3 - DataFrames serialization format for Julia

See GitHub - xiaodaigh/JDF.jl: Julia DataFrames serialization format

JDF is the Julia DataFrames serialization format. It’s a specialised serialization format and hence doesn’t support arbitrary objects like JLD2 and JLSO. However, for the formats it supports it typically results in smaller file size, faster save/load, and is less likely to have difficult to resolve bugs.

New functionality

On disk access to metadata

One can now refer to the JDF file on disk and obtain some metadata from it including column names, and nrow and ncol.

E.g.

using DataFrames, JDF
df = DataFrames(a = 1:3, b = 1:3, c = 1:3)
savejdf(df, "df.jdf")

df_ondisk = jdf"df.jdf"

names(df_ondisk)

nrow(df_ondisk)

nrow(df_ondisk)

Selective loading of columns

One can use the cols argument in loadjdf to load selected columns. For example, to load columns :a and :c, we do

df_cols_a_c = loadjdf("df.jdf", cols = [:a, :c])

eachcol iterator directly from disk

Without loading the whole dataframe into RAM, JDF allows you to iterate through each column

for col in eachol(df_ondisk)
  # col is the column
end

to iterate through the names and it’s corresponding column, we do

for (name, col) in zip(names(df_ondisk), eachol(df_ondisk))
  # col is the column
end

New supported types

This version supports a few data types including

  • Char
  • TimeZones.ZonedDateTime (experimental)
  • Symbol

in addition to the types it already supports which include

  • WeakRefStrings.StringVector
  • Vector{T} , Vector{Union{Mising, T}} , Vector{Union{Nothing, T}}
  • CategoricalArrays.CategoricalVetors{T}

where T can be String , Bool , Symbol , Char , TimeZones.ZonedDateTime (experimental) and isbits types i.e. UInt* , Int* , and Float* Date* types etc.

See previous announcement

4 Likes