# [ANN] DataKnots.jl - an extensible, practical and coherent algebra of query combinators

#1

DataKnots implements an algebraic query interface of Query Combinators. This algebra’s elements, or queries, represent relationships among class entities and data types. This algebra’s operations, or combinators, are applied to construct query expressions.

We seek to prove that this query algebra has significant advantages over the state of the art:

• DataKnots is a practical alternative to SQL with a declarative syntax; this makes it suitable for use by domain experts.
• DataKnots’ data model handles nested and recursive structures (unlike DataFrames or SQL); this makes it suitable for working with CSV, JSON, XML, and SQL databases.
• DataKnots has a formal semantic model based upon monadic composition; this makes it easy to reason about the structure and interpretation of queries.
• DataKnots is a combinator algebra (like XPath but unlike LINQ or SQL); this makes it easier to assemble queries dynamically.
• DataKnots is fully extensible with Julia; this makes it possible to specialize it into various domain specific query languages.
15 Likes
#2

This looks really cool! I’ll take a look.

Do you know offhand how it compares to @davidanthoff 's Query.jl

#3

Here are some points of comparison from what I’ve seen:

• Queries in DataKnots are built from subqueries. These can be assigned names and used independently, e.g., for testing or abstraction (like functions).
• DataKnots works on nested data, while Query works (mostly) with tabular data, row-by-row.
• DataKnots works with higher-order functions, while Query works with macros.

Also, for the first release, DataKnots works with its own datatype for in-memory columnar storage, while Query works on any datatype that implements the TableTraits(?) interface. But I’m guessing that this is going to change in future releases.

2 Likes
#4

It’s a great question! Let’s see…

• Both serve the same purpose, to query data.
• Query.jl is usable now, but I wouldn’t recommend DataKnots.jl to end users just yet. It does not provide an easy way to import data, and some basic combinators like group/sort/join are not yet implemented. We plan to add them soon…
• The biggest difference is the query model. Unlike Query.jl, DataKnots.jl is an algebra and does not rely on variables, binding operations, or macros. This gives it a distinct declarative flavor and makes it easier to construct queries incrementally or dynamically.

1 Like
#5

Just a few points:

• Query.jl does not work primarily on tabular data, it just happens to be the case that that is what folks use it for. In fact, the core of it doesn’t even know what a table is The core abstraction in Query.jl is that of iterators, not rows. Iterators of named tuples also work well, and those are tables, but that is really just a special case that happens to be popular.
• The macro part in Query.jl is only providing syntactic sugar for 1) named tuples and 2) anonymous functions for the piped version of things (of course for the LINQ style queries it does more). The piped version of things conceptually is based on higher-order functions.
• One can use Query.jl with hierarchical/nested data! There are lots of examples for that in the LINQ world. It might be less elegant than the DataKnots story, not sure.
• The core idea of Query.jl is also based on monadic composition. There are lots of old Channel 9 videos that go into the details of that for the LINQ case, and pretty much everything said there applies to Query.jl as well.
• “does not rely on variables, binding operations, or macros” I’m not entirely sure what you mean by this But at least the macro part I think is not important, see above.

Great to see this effort, I really should look into the details a bit more!

2 Likes
#6

LINQ and DataKnots are both based on monads, but they use different primitives as the building blocks, which give them very different flavors. LINQ uses monadic comprehension, which lets you transform a monadic container by a chain of monadic functions: MA \to (A \to MB) \to MB, then MB \to (B \to MC) \to MC. DataKnots leaves out containers and uses monadic composition (A \to MB) \to (B \to MC) \to (A \to MC). The difference is in the type of objects you want to deal with. With comprehension, you operate on monadic containers MX; with composition, on monadic functions X \to MY.

I agree that macros are not relevant. Regarding variables, this is just as I stated: DataKnots does not use bound variables. For example, compare a query Filter(It.age .> 50) with a similar LINQ expression where p.age > 50. LINQ example is only valid within a context of the from binder that binds p. On the other hand, It.age .> 50 is a self-contained query and does not depend on any context.

4 Likes
#7

Hello all. So, we’ve opted to release DataKnots v0.1.1 in an unfinished state, mostly by pruning features so that we could demonstrate this query combinator approach, with working code and documentation. Even so, there is quite a bit in this release:

• Scalars, vectors and tuples are constant queries (Lift).
• Composition (>>) and identity (It) lets one connect queries.
• Aggregates include Count, Sum, Max and Min.
• Record permits new structures to be constructed.
• Filter permits output rows to be tailored.
• Drop / Take allow results to be paged.
• Keep / Given permit intermediate results to be reused.
• Julia functions can be used as query combinators (Lift).

The ability to Group, Join, Merge, and Sort data, among many others, will be implemented in subsequent releases. Yet, as we add them, the storage system and query mechanics will gain additional complexity that will make the code less digestible. Hence, this particular release is a sweet-spot of functionality vs complexity that could be studied (and perhaps improved) by others.

Happy Hacking!

2 Likes