Help to build a django-like orm in julia

I’m spending some time these last days studying the viability of building a Django-like orm in julia. I made some progress, but I haven’t had much luck with the performance. I’ve used the searchlight with the start point. Anyone interested in helping?

An example:

a = object("tb_fat_visita_domiciliar")
.values("co_dim_tempo__dt_registro__y_month", "co_seq_fat_visita_domiciliar", "co_fat_cidadao_pec__co_fat_cad_domiciliar")
a.filter(Qor("co_seq_fat_visita_domiciliar__isnull" => true, Q("co_dim_tempo__dt_registro__y_month__gte" => "2023-01", "co_dim_tempo__dt_registro__y_month__lte" => "2025-01")))
a.query()
┌ Info:  Query returned:
│ SELECT
│   to_char(tb_1.co_dim_tempo, 'YYYY-MM') as co_dim_tempo__dt_registro__y_month,
│   tb.co_seq_fat_visita_domiciliar,
│   tb_2.co_fat_cidadao_pec as co_fat_cidadao_pec__co_fat_cad_domiciliar
│ FROM tb_fat_visita_domiciliar as tb
│  LEFT JOIN tb_dim_tempo tb_1 ON tb.co_dim_tempo = tb_1.co_seq_dim_tempo
│  LEFT JOIN tb_fat_cidadao_pec tb_2 ON tb.co_fat_cidadao_pec = tb_2.co_seq_fat_cidadao_pec
└ WHERE (tb.co_seq_fat_visita_domiciliar IS NULL OR (to_char(tb_1.co_dim_tempo, 'YYYY-MM') >= '2023-01' AND to_char(tb_1.co_dim_tempo, 'YYYY-MM') <= '2025-01'))
  0.004621 seconds (48.53 k allocations: 1.661 MiB)

I think that 0,005 seconds is very time to build a simple query and this memory allocation is even more limiting.

The function with I’ve problem is the _build_row_join and stays in src/QueryBuilder.jl

You have a lot of allocations there…

People here will absolutely help make this a few orders of magnitude faster, but you need full runnable MWE to get them interested.

(like all self contained here, not in your package)

1 Like

In case you’re interested, there are already a couple Julia packages for programmatically creating SQL queries:

2 Likes

Hi @CameronBieganek, in fact, those packages are cool, and this approach is used for many other languages (frameworks), but django’s approach is good to more simply query, but that uses many joins, common in backend servers.

So I decided to try something, but if I couldn’t improve performance, I’ll stop here.

I think an integration into Prequel prql for Julia might be helpful if you needed to use it beyond the existing duckdb extension. It might help to transition logic between languages.

2 Likes

I’ve been continuing my work on a hobby project that I use to learn Julia in my spare time.

I work with public health data, which requires a lot of read queries—something FunSQL.jl is great for. However, I also need to perform many write queries. In addition to this, I have a Django application that requires a large-scale ETL process, which I also use Julia for. While Django is amazing for many things, Python can be too slow for some of my critical needs. As a result, I use Julia as a microservice to handle a few processes for Django.

Although I have already improved the speed and memory allocations, I’ve put further optimization on the back burner for now. My main objective at this stage is to make the application easier for me to understand (and yes, that includes using @infiltrate a lot in my app).

@time begin
           query = M.Result |> object;
           query.filter("statusid__status" => "Finished", "driverid__forename" => "Ayrton");
           query.values("raceid__circuitid__name", "driverid__forename", "constructorid__name", "count_grid" => Count("grid"), "max_grid" => Max("grid"), "min_grid" => Min("grid"));
           query.order_by("raceid__circuitid__name");
           query |> show_query
       end
  0.000212 seconds (747 allocations: 41.250 KiB)

Currently, PormG.jl only supports PostgreSQL. I’ve been focusing on refining the bulk update operations, but of course, there’s still a lot to do.

julia> query = M.Just_a_test_deletion |> object;

julia> query.filter("test_result" => 1);

julia> query.update("name" => "test_update")

julia> query = M.Just_a_test_deletion |> object;

julia> df = query |> DataFrame
3×4 DataFrame
 Row │ test_result  id     name           test_result2 
     │ Int64        Int64  String         Int64?       
─────┼─────────────────────────────────────────────────
   1 │           2    186  test_update_1       missing 
   2 │           3    187  test_update_2       missing 
   3 │           1    185  test_update              44

julia> query = M.Just_a_test_deletion |> object;

julia> for (index, row) in eachrow(df) |> enumerate
         row.name = "test_update_$(index)"
       end

julia> bulk_update(query, df, columns=["name"], filters=["id"], show_query=true)
┌ Info: UPDATE "just_a_test_deletion" AS "Tb"
│ SET "name" = source."name"::varchar
│ FROM (VALUES ($1, $2),($3, $4),($5, $6)) AS source ("name","id")
└ WHERE "Tb"."id" = source."id"::bigint

I haven’t published PormG.jl as a package yet. However, I’ve used AI to generate the documentation, so if you want to see the project’s current stage, you can check it out here:

And of course, I am looking for anyone interested in helping to develop a Django-like ORM.

2 Likes

I wish Julia had a Polars-like syntax for dataframe queries, similar to:

@pipe df |>
sort(col(:Date)) |>
filter(col(:Price) > 5.4321) |>
select(:Date, :Price, alias(col(:Price) * 5, :MuchPrice))

The key here is the ability to directly do math on columns. I think this is great API:

  1. No need to pass clunky lambdas to filter and select calls.
  2. Just write what you mean: want to multiply two columns? Just write it out: col(:a) * col(:b)!

DataFramesMeta.jl does this.

using Statistics 

df = DataFrame(a = repeat(1:5, outer = 20),
               b = repeat(["a", "b", "c", "d"], inner = 25),
               x = repeat(1:20, inner = 5))

x_thread = @chain df begin
    @transform(:y = 10 * :x)
    @subset(:a .> 2)
    @by(:b, :meanX = mean(:x), :meanY = mean(:y))
    @orderby(:meanX)
    @select(:meanX, :meanY, :var = :b)
end
2 Likes