That’s an interesting question. I think what you have in mind sounds more like optimizing Query for JuliaDB. To work with more or less everything, Query only assumes the table can iterate rows, but of course some table implementations can do much more and Query could take advantage of that. Not sure how hard this kind of project would be, but it’d certainly be useful: maybe a good GSOC idea?
Concerning DataFrames versus JuliaDB, I think they are converging to the same optimum from different sides, in that DataFrames started not fully typed, then it became clear that for some operations it’d be better to be fully typed and I think there are plans to create a fully typed wrapper (to avoid code duplication this could maybe be Columns
from IndexedTables). JuliaDB started fully typed but to simplify modifying a table the not fully typed column dictionary ColDict
was added, which is pretty much like a DataFrame IIUC…
As for DataFramesMeta versus JuliaDBMeta , the implementations are actually quite different. DataFramesMeta is pretty much column-based (meaning, it extracts columns before running the code to circumvent type stability issues and then works on those columns) with the exception of @byrow!
, whereas JuliaDBMeta has a few row-wise macros which are fast (at least in theory, haven’t benchmarked yet…) and work out of the box with out-of-core data: implementing this starting from DataFrames would be, I believe, much more challenging. As a downside, due to the full-typing of tables, I need to be careful not to get too high compile times.