Hi,
before Julia & Data: An Evolving Ecosystem BOF I wanted to run a quick pool what you think about different major directions in DataFrames.jl development. So please upvote the things you perceive as important. This will help me decide where to focus the effort in short term (and discuss this during BOF).
- add threading support
- faster joins
- faster aggregation
- adding more expressiveness to the mini-language
- adding more utility functions in the package
- display improvements
- decoupling of DataFramesBase.jl
- solving problem with CategoricalArrays.jl compilation times
- adding metadata to DataFrame
0 voters
Now let me give some comments on the options:
- add threading support - @nalimilan has been working on this recently; it is harder than we thought
- faster joins - probably we need a redesign; maybe even delegate the task to a separate package that does it universally for Tables.jl compliant types
- faster aggregation - probably adding issorted is the biggest gain we do not consider now
- adding more expressiveness to the mini-language - I am in favor of this, but I am slightly afraid not to overcomplicate things, which would make the learning curve for newcomers very steep (selected open issues: https://github.com/JuliaData/DataFrames.jl/pull/2281, https://github.com/JuliaData/DataFrames.jl/pull/2228, https://github.com/JuliaData/DataFrames.jl/issues/2328, https://github.com/JuliaData/DataFrames.jl/issues/2314, https://github.com/JuliaData/DataFrames.jl/issues/2272, https://github.com/JuliaData/DataFrames.jl/issues/2258, https://github.com/JuliaData/DataFrames.jl/issues/2227, https://github.com/JuliaData/DataFrames.jl/issues/2220, https://github.com/JuliaData/DataFrames.jl/issues/2207, https://github.com/JuliaData/DataFrames.jl/issues/2203, https://github.com/JuliaData/DataFrames.jl/issues/2171, https://github.com/JuliaData/DataFrames.jl/issues/2133, https://github.com/JuliaData/DataFrames.jl/issues/2106)
- adding more utility functions in the package - we can have it and I like the ideas, but the question is if this should not be left for external packages so that we can limit the DataFrames.jl API which will simplify maintenance and learning (selected sample issues: https://github.com/JuliaData/DataFrames.jl/pull/2169, https://github.com/JuliaData/DataFrames.jl/pull/1864, https://github.com/JuliaData/DataFrames.jl/pull/1181, https://github.com/JuliaData/DataFrames.jl/issues/2325, https://github.com/JuliaData/DataFrames.jl/issues/2275, https://github.com/JuliaData/DataFrames.jl/issues/2259, https://github.com/JuliaData/DataFrames.jl/issues/2257, https://github.com/JuliaData/DataFrames.jl/issues/2243, https://github.com/JuliaData/DataFrames.jl/issues/2053, https://github.com/JuliaData/DataFrames.jl/issues/2048, https://github.com/JuliaData/DataFrames.jl/issues/659)
- display improvements - these things are hard and pop up often; the major question is if we want to add PrettyTebles.jl as an alternative default backend (selected issues: https://github.com/JuliaData/DataFrames.jl/pull/2330, https://github.com/JuliaData/DataFrames.jl/pull/2087, https://github.com/JuliaData/DataFrames.jl/pull/1688, https://github.com/JuliaData/DataFrames.jl/issues/2337, https://github.com/JuliaData/DataFrames.jl/issues/2302, https://github.com/JuliaData/DataFrames.jl/issues/2246, https://github.com/JuliaData/DataFrames.jl/issues/2146, https://github.com/JuliaData/DataFrames.jl/issues/1631, https://github.com/JuliaData/DataFrames.jl/issues/1272, https://github.com/JuliaData/DataFrames.jl/issues/864, https://github.com/JuliaData/DataFrames.jl/issues/592)
- decoupling of DataFramesBase.jl - this would allow to opt-out from the extra API we provide if someone does not want it and reduce load/compilation times, but maybe it is not needed if we solve CategoricalArrays.jl compilation time issues (https://github.com/JuliaData/DataFrames.jl/issues/1764, https://github.com/JuliaData/DataFrames.jl/issues/1502)
- solving problem with CategoricalArrays.jl compilation times - I know @nalimilan is currently working on it, but the question is important to solve it is, given âtime to first plotâ impact (https://github.com/JuliaData/DataFrames.jl/issues/2321)
- adding metadata to DataFrame - this is a long standing missing piece in the core design; but it is hard - so the question is how important it is considered to have (https://github.com/JuliaData/DataFrames.jl/pull/1458, https://github.com/JuliaData/DataFrames.jl/issues/2276, Metadata for columns and/or DataFrames ¡ Issue #35 ¡ JuliaData/DataFrames.jl ¡ GitHub)