DataSkimmer.jl exposes a function skim()
which prints summary statistics in the REPL. It was inspired by the output of the skimr R package.
The goal is to be able to summarise any Tables.jl
compatible table, so if you want to help, try running skim()
on any tables you have and send me examples where it breaks.
Hereβs an example using the iris dataset:
# Load some data
using RDatasets
iris = RDatasets.dataset("datasets", "iris")
# Skim the data
using DataSkimmer
skim(iris)
βββββββββββββββββββββββ¬ββββββββββββ
β Type β DataFrame β
β N. rows β 150 β
β N. cols β 5 β
β N. numeric cols β 4 β
β N. categorical cols β 1 β
β N. datetime cols β 0 β
βββββββββββββββββββββββ΄ββββββββββββ
4 numeric columns
βββββββββββββββ¬ββββββββββ¬βββββββββββ¬βββββββββββ¬βββββββ¬βββββββ¬βββββββ¬βββββββ¬βββββββ¬ββββββββ
β Name β Type β Missings β Complete β Mean β Std. β Min. β Med. β Max. β Hist. β
βββββββββββββββΌββββββββββΌβββββββββββΌβββββββββββΌβββββββΌβββββββΌβββββββΌβββββββΌβββββββΌββββββββ€
β SepalLength β Float64 β 0 β 100.0% β 5.84 β 0.83 β 4.3 β 5.8 β 7.9 β βββββ β
β SepalWidth β Float64 β 0 β 100.0% β 3.06 β 0.44 β 2.0 β 3.0 β 4.4 β βββββ β
β PetalLength β Float64 β 0 β 100.0% β 3.76 β 1.77 β 1.0 β 4.35 β 6.9 β βββββ β
β PetalWidth β Float64 β 0 β 100.0% β 1.2 β 0.76 β 0.1 β 1.3 β 2.5 β βββββ β
βββββββββββββββ΄ββββββββββ΄βββββββββββ΄βββββββββββ΄βββββββ΄βββββββ΄βββββββ΄βββββββ΄βββββββ΄ββββββββ
1 categorical column
βββββββββββ¬βββββββββββββββββββββββββββββββββ¬βββββββββββ¬βββββββββββ
β Name β Type β Missings β Complete β
βββββββββββΌβββββββββββββββββββββββββββββββββΌβββββββββββΌβββββββββββ€
β Species β CategoricalValue{String,UInt8} β 0 β 100.0% β
βββββββββββ΄βββββββββββββββββββββββββββββββββ΄βββββββββββ΄βββββββββββ
No datetime columns