With the release of DataFrames 1.0 I was curious about the performance benefits of accessing data with it vs something like an indexed table. The example would be having tabular data with a column and wanting to return a subtable of items that match the key column.
using DataFrames
using JuliaDB
df = DataFrame(sex = ["male", "male", "female", "male", "female", "male"], age = [20, 14, 65, 34, 23, 67])
df[df.sex .== "male", :]
tbl = ndsparse(df)
tbl[("male",)]
When I try benchmarking on a laptop I get the following results but I am not sure if it is simply because of the tiny size of this example
julia> @btime $df[$df.sex .== "male", :]
1.450 ΞΌs (21 allocations: 1.58 KiB)
4Γ2 typename(DataFrame)
β Row β sex β age β
β β String β Int64 β
βββββββΌβββββββββΌββββββββ€
β 1 β male β 20 β
β 2 β male β 14 β
β 3 β male β 34 β
β 4 β male β 67 β
julia> @btime $tbl[("male",)]
6.778 ΞΌs (80 allocations: 4.44 KiB)
1-d NDSparse with 4 values (1 field named tuples):
sex β age
ββββββββΌββββ
"male" β 20
"male" β 14
"male" β 34
"male" β 67