My understanding is that the package:
- was a fresh re-write (EDIT: after reading the source codes of the package it seems it took the DataFrames.jl sources that the creator liked and dropped parts that were baggage), so it does not have a baggage of not breaking things we have in DataFrames.jl.
- it currently makes more assumptions what data it can store/process and uses these assumptions in the algorithms (DataFrames.jl is designed to store anything that is valid Julia “as is”). Of course in the future maybe these restrictions would be lifted.
An example of the second point:
julia> name = Dataset(ID = vcat.([1, 2, 3]), Name = ["John Doe", "Jane Doe", "Joe Blogs"])
3×2 Dataset
Row │ ID Name
│ identity identity
│ Array…? String?
─────┼─────────────────────
1 │ [1] John Doe
2 │ [2] Jane Doe
3 │ [3] Joe Blogs
julia> job = Dataset(ID = vcat.([1, 2, 2, 4]), Job = ["Lawyer", "Doctor", "Florist", "Farmer"])
4×2 Dataset
Row │ ID Job
│ identity identity
│ Array…? String?
─────┼────────────────────
1 │ [1] Lawyer
2 │ [2] Doctor
3 │ [2] Florist
4 │ [4] Farmer
julia> leftjoin(name, job, on = :ID)
ERROR: MethodError: Cannot `convert` an object of type Vector{Int64} to an object of type Integer
julia> leftjoin(DataFrame(name), DataFrame(job), on = :ID)
4×3 DataFrame
Row │ ID Name Job
│ Array… String String?
─────┼────────────────────────────
1 │ [1] John Doe Lawyer
2 │ [2] Jane Doe Doctor
3 │ [2] Jane Doe Florist
4 │ [3] Joe Blogs missing