A serious data start-up structured around a Julia data manipulation framework for larger-than-RAM data

CameronBieganek · September 16, 2024, 4:02am

DTables.jl doesn’t do query optimization. The current map and filter API in DTables.jl is not very conducive to query optimization. A lazy map (i.e. select) or filter operator needs to know exactly which columns are being operated on in order to enable various relational algebra expression rewrites. But with the current API, the columns that are operated on are hidden inside the opaque f that is passed to map or filter. Taking a row and returning a row in the map function also makes query optimization more challenging. Overall, it does not seem like DTables.jl was designed with query optimization in mind.

The package I am developing is primarily targeted at working with in-memory data, and secondarily targeted at working with larger-than-memory data. Distributed data is a distant third.

Topic		Replies	Views
What's the latest and greatest in data in Julia Data	29	2379	August 15, 2024
Future directions for DataFrames.jl Data package , dataframes	47	6789	June 3, 2022
Struggling with Julia and large datasets General Usage question , big-data	67	11543	October 17, 2024
Direct interface to Polars Rust library Data question	13	1844	November 9, 2023
How is the data ecosystem right now for large datasets? Data	35	6922	July 13, 2017

A serious data start-up structured around a Julia data manipulation framework for larger-than-RAM data

Related topics