A serious data start-up structured around a Julia data manipulation framework for larger-than-RAM data

DTables.jl doesn’t do query optimization. The current map and filter API in DTables.jl is not very conducive to query optimization. A lazy map (i.e. select) or filter operator needs to know exactly which columns are being operated on in order to enable various relational algebra expression rewrites. But with the current API, the columns that are operated on are hidden inside the opaque f that is passed to map or filter. Taking a row and returning a row in the map function also makes query optimization more challenging. Overall, it does not seem like DTables.jl was designed with query optimization in mind.

The package I am developing is primarily targeted at working with in-memory data, and secondarily targeted at working with larger-than-memory data. Distributed data is a distant third.