I’m happy to announce JuliaDBMeta: a set of macros to simplify data manipulations with JuliaDB tables. It can be considered as a “port” of DataFramesMeta to JuliaDB exploiting features of JuliaDB tables:
- fast row iteration
- complete type information about columns
It allows to manipulate data referring to columns directly by their name (thus selecting only relevant columns in @groupby
and @map
operations as it knows what the user will need). Query’s {}
syntax for automatically naming columns is also supported:
julia> using JuliaDB, JuliaDBMeta
julia> t = table([1,2,1,2], [4,5,6,7], [0.1, 0.2, 0.3,0.4], names = [:x, :y, :z])
Table with 4 rows, 3 columns:
x y z
─────────
1 4 0.1
2 5 0.2
1 6 0.3
2 7 0.4
julia> @groupby t :x {mean(:y) + mean(:z)}
Table with 2 rows, 2 columns:
x mean(y) + mean(z)
────────────────────
1 5.2
2 6.3
julia> @map t (:x + :y)/:z
4-element Array{Float64,1}:
50.0
35.0
23.3333
22.5
@apply
and @applycombine
allow concatenating many of these tasks together (potentially after grouping) and normal JuliaDB
operations can be thrown in the mix as well:
julia> @apply t begin
@where :x == 2
@transform {:x + :y}
sort(_, :z)
end
Table with 2 rows, 4 columns:
x y z x + y
────────────────
2 3 0.2 5
2 3 0.4 5
julia> iris = loadtable(Pkg.dir("JuliaDBMeta", "test", "tables", "iris.csv"));
julia> @applycombine iris :Species begin
select(_, 1:3, by = i -> i.SepalWidth, rev = true)
@map {:SepalWidth, Ratio = :SepalLength / :SepalWidth}
sort(_, by = i -> i.SepalWidth, rev = true)
end
Table with 9 rows, 3 columns:
Species SepalWidth Ratio
─────────────────────────────────
"setosa" 4.4 1.29545
"setosa" 4.2 1.30952
"setosa" 4.1 1.26829
"versicolor" 3.4 1.76471
"versicolor" 3.3 1.90909
"versicolor" 3.2 1.84375
"virginica" 3.8 2.07895
"virginica" 3.8 2.02632
"virginica" 3.6 2.0
Plotting is also supported at the end of this pipeline using StatPlots’ @df
macro:
julia> using StatPlots
julia> @apply iris begin
@where :SepalLength > 4
@transform {ratio = :PetalLength / :PetalWidth}
@df scatter(:PetalLength, :ratio, group = :Species)
end
Distributed tables are partially supported (meaning not all macros work with them already), but I’m working on it and is the next important step for the package.
For more information see the README
Feedback is welcome!