[ANN] A new lightning fast package for data manipulation in pure Julia

bkamins · March 21, 2022, 2:33pm

My understanding is that the package:

was a fresh re-write (EDIT: after reading the source codes of the package it seems it took the DataFrames.jl sources that the creator liked and dropped parts that were baggage), so it does not have a baggage of not breaking things we have in DataFrames.jl.
it currently makes more assumptions what data it can store/process and uses these assumptions in the algorithms (DataFrames.jl is designed to store anything that is valid Julia “as is”). Of course in the future maybe these restrictions would be lifted.

An example of the second point:

julia> name = Dataset(ID = vcat.([1, 2, 3]), Name = ["John Doe", "Jane Doe", "Joe Blogs"])
3×2 Dataset
 Row │ ID        Name
     │ identity  identity
     │ Array…?   String?
─────┼─────────────────────
   1 │ [1]       John Doe
   2 │ [2]       Jane Doe
   3 │ [3]       Joe Blogs

julia> job = Dataset(ID = vcat.([1, 2, 2, 4]), Job = ["Lawyer", "Doctor", "Florist", "Farmer"])
4×2 Dataset
 Row │ ID        Job
     │ identity  identity
     │ Array…?   String?
─────┼────────────────────
   1 │ [1]       Lawyer
   2 │ [2]       Doctor
   3 │ [2]       Florist
   4 │ [4]       Farmer

julia> leftjoin(name, job, on = :ID)
ERROR: MethodError: Cannot `convert` an object of type Vector{Int64} to an object of type Integer

julia> leftjoin(DataFrame(name), DataFrame(job), on = :ID)
4×3 DataFrame
 Row │ ID      Name       Job
     │ Array…  String     String?
─────┼────────────────────────────
   1 │ [1]     John Doe   Lawyer
   2 │ [2]     Jane Doe   Doctor
   3 │ [2]     Jane Doe   Florist
   4 │ [3]     Joe Blogs  missing

Topic		Replies	Views
Rowwise compuation in `InMemoryDatasets.jl` vs `DataFrames.jl` Performance data , dataframes , inmemorydatasets	2	684	March 23, 2022
ANN: JuliaDB.jl Community	40	9694	November 13, 2018
[ANN] Cleaner.jl: A toolbox of simple solutions for common data cleaning problems Package Announcements package , announcement	12	2293	October 29, 2021
Tabulations.jl - function tabulation made easy Package Announcements package , announcement , physics	3	1040	December 7, 2021
Column types in DataFrames vs. InMemoryDatasets General Usage dataframes , inmemorydatasets	6	969	March 29, 2022

[ANN] A new lightning fast package for data manipulation in pure Julia

Related topics