Note that many table types already overload common Julia functions such as map
. This single API makes it very convenient and consistent to work with them. Add DataPipes.jl
(disclaimer: my package) on top to reduce syntactic boilerplate - and turns out that basically all operations from the first post are easy to write for a wide variety of tables!
My translation of those examples to TypedTables.Table
is below. It also works as-is with other implementations such as Tables.rowtable
(vector of namedtuples). As a bonus, all operations are easily βpipeableβ (:
julia> using Random
julia> using SplitApplyCombine
julia> using Tables
julia> using TypedTables
julia> using DataPipes
julia> using Missings
julia> table = Table(
id = shuffle(1:5),
group = rand('a':'b', 5),
weight_kg = randn(5) .* 5 .+ 60,
height_cm = randn(5) .* 10 .+ 170
)
Table with 4 columns and 5 rows:
id group weight_kg height_cm
βββββββββββββββββββββββββββββββββ
1 β 1 a 55.1194 150.885
2 β 4 b 55.2233 164.44
3 β 3 a 51.9789 178.343
4 β 5 a 59.483 166.938
5 β 2 b 53.3129 179.829
julia> @p table |> map((height_m = _.height_cm / 100,))
Table with 1 column and 5 rows:
height_m
ββββββββββ
1 β 1.50885
2 β 1.6444
3 β 1.78343
4 β 1.66938
5 β 1.79829
julia> @p table |> map((w = _.weight_kg, h = _.height_cm))
Table with 2 columns and 5 rows:
w h
ββββββββββββββββββ
1 β 55.1194 150.885
2 β 55.2233 164.44
3 β 51.9789 178.343
4 β 59.483 166.938
5 β 53.3129 179.829
julia> @p table |> mutate(weight_g = _.weight_kg / 1000)
Table with 5 columns and 5 rows:
id group weight_kg height_cm weight_g
ββββββββββββββββββββββββββββββββββββββββββββ
1 β 1 a 55.1194 150.885 0.0551194
2 β 4 b 55.2233 164.44 0.0552233
3 β 3 a 51.9789 178.343 0.0519789
4 β 5 a 59.483 166.938 0.059483
5 β 2 b 53.3129 179.829 0.0533129
julia> @p table |> mutate(BMI = _.weight_kg / (_.height_cm / 100)^2)
Table with 5 columns and 5 rows:
id group weight_kg height_cm BMI
ββββββββββββββββββββββββββββββββββββββββββ
1 β 1 a 55.1194 150.885 24.2111
2 β 4 b 55.2233 164.44 20.4223
3 β 3 a 51.9789 178.343 16.3424
4 β 5 a 59.483 166.938 21.3444
5 β 2 b 53.3129 179.829 16.486
julia> g = @p table |> group(iseven(_.id))
2-element Dictionaries.Dictionary{Bool, Table{NamedTuple{(:id, :group, :weight_kg, :height_cm), Tuple{Int64, Char, Float64, Float64}}, 1, NamedTuple{(:id, :group, :weight_kg, :height_cm), Tuple{Vector{Int64}, Vector{Char}, Vector{Float64}, Vector{Float64}}}}}
false β Table with 4 columns and 3 rows:
id group weight_kg height_cm
βββββββββββββββββββββββββββββββββ
1 β 1 a 55.1194 150.885
2 β 3 a 51.9789 178.343
3 β 5 a 59.483 166.938
true β Table with 4 columns and 2 rows:
id group weight_kg height_cm
βββββββββββββββββββββββββββββββββ
1 β 4 b 55.2233 164.44
2 β 2 b 53.3129 179.829
julia> @p g |> map(@p(sum(_.weight_kg, _1)))
2-element Dictionaries.Dictionary{Bool, Float64}
false β 166.58126233561455
true β 108.53621402076169
julia> @p table |> sort(by=-sqrt(_.height_cm))
Table with 4 columns and 5 rows:
id group weight_kg height_cm
βββββββββββββββββββββββββββββββββ
1 β 2 b 53.3129 179.829
2 β 3 a 51.9789 178.343
3 β 5 a 59.483 166.938
4 β 4 b 55.2233 164.44
5 β 1 a 55.1194 150.885