A serious data start-up structured around a Julia data manipulation framework for larger-than-RAM data

In addition to fast, user-defined functions, as @xiaodai mentioned, there are at least two other advantages to being able to write queries in Julia. I use Polars at work, so I will provide examples in Polars, but the same could be done with Julia queries.

Code reuse

Code reuse in SQL is difficult, but it is easy in Polars. Here’s an example:

def shift_over(col_name, n):
    return (
        col(col_name)
        .shift(n)
        .over(
            partition_by = ["serial_number", "trip"],
            order_by = "timestamp"
        )
    )

df2 = df.with_columns(
    x_lag_1  = shift_over("a", 1),
    x_lead_1 = shift_over("a", -1),
    y_lag_1  = shift_over("b", 1),
    y_lead_1 = shift_over("b", -1)
)

Programmatically generate columns

It’s easy to programmatically generate new columns in Polars. Not so easy in SQL. Here’s an example in Polars that has both code reuse and programmatically generated columns:

def forward_circular_shift(col_name, n):
    return (
        col(col_name)
        .tail(n)
        .append(
            col(col_name).head(pl.len() - n)
        )
        .alias(col_name + f"_lag_{n}")
    )

df2 = (
    df
    .with_columns(
        [
            forward_circular_shift(col_name, n)
            for col_name in ["a", "b", "c", "d", "e"]
            for n in [1, 2, 3, 4, 5]
        ]
    )
)