Thanks for the detailed writeup, now I also read about tidierdb a bit more carefully It does indeed look very convenient and suitable for those coming from the R approach of non-standard evaluation. Happy for that crowd – now they also have a familiar data manipulation interface in Julia!
Meanwhile, SQLCollections.jl takes the philosophy of regular Julia functions. Most fundamentally, it involves composability, in several senses.
Note that SQLCollections doesn’t even have to do anything special for these – it’s automatic whenever one writes Julia functions.
One sense is just the referential transparency: I can take an expression passed to a function, assign it to a variable, and pass that variable to the function – the result would be exactly the same. This isn’t how R or Tidier** macros work – there, one cannot simply assign the function/macro argument to a variable and pass it around.
For example, in regular Julia data manipulation, these are equivalent – and with SQLCollections, of course:
... filter(@o _.a > 3) ...
# is the same as
pred = @o _.a > 3
... filter(pred) ...
# is the same as
val = 3
... filter(@o _.a > val) ...
# is the same as
val = 3
func = >
... filter(@o func(_.a, val)) ...
Another composability manifestation is being able to write separate functions for parts of the pipeline. Again, this is ubiquitous in Julia, including SQLCollections:
do_some_selection(data) = filter(..., data)
... |> do_some_selection |> ...
In R, or with Julia macros, it’s also possible – but one needs to be careful to achieve that.
And finally, composability with a wide range of types – the exact same code and functions, from collections to SQL databases. Some, like filter
, even work for dataframes, unfortunately dataframes don’t support other common data manipulation functions.
As for SQLCollections vs TidierDB comparison specifically, now that I understand it a bit more, I’d highlight these points:
- SQLCollections uses the most general syntax applicable to many data types in Julia. While the only syntax similar to TidierDB (afaik) is TidierData, and it only supports one data type – dataframes.
- SQLCollections lets you write the exact same syntax for Julia collections and for databases. TidierDB uses syntax similar to TidierData, but (afaik) not exactly the same – the former requires
DB.
prefix, so writing a single function to process both is impossible. (although you @drizk1 say that the syntax is the same, I’m confused)
- SQLCollections doesn’t define any macros and the interface is function-based. This gives native familiar Julia semantics and composability, something nontrivial/impossible to get in a macro-based interface.
It’s nice to see different approaches to design when solving similar problems, highlights differences in background I guess