DuckDB.jl does not free memory

I am having an issue with memory allocation using the DuckDB.jl package. I am querying subsets of a large table (in a database stored in a file) and writing the results to a file. Given a table with columns id1, id2, first_date, last_date the following eventually runs out of memory:

con = db.connect(DuckDB.DB, "database.db")
DBInterface.execute(con, "PRAGMA threads=8;")
for curdate in query_dates
          FROM my_large_table
          WHERE (CAST('$curdate' AS DATE) + INTERVAL 6 DAY) >= first_date
          AND '$curdate' <= last_date)
          TO 'data/$(curdate).parquet' (COMPRESSION ZSTD);

An equivalent query using python works (fills cache but frees memory when needed)

# %%
import duckdb
from tqdm import tqdm

con = duckdb.connect(
    read_only = True)

def make_querystring(curdate):
    return f"""
        FROM my_large_table
        WHERE (CAST('{curdate}' AS DATE) + INTERVAL 6 DAY) >= first_date
        AND '{curdate}' <= last_date
        TO 'data/{curdate}.parquet' (COMPRESSION ZSTD);

for cur_date in tqdm(date_list):
    cur_q: str = make_querystring(cur_date)

Any ideas how I can force DuckDB.jl to free memory (closing the result or the database after every query did not work in my test)?


1 Like

That’s because of a bug where it doesn’t actually close the database :laughing:

There is an existing GitHub issue and I pointed to a possible solution but evidently it wasn’t good enough (meaning that some reported my fix didn’t fix the problem).

I definitely run into problems like this. One thing I will say is that there are a lot of fixes that have been merged and will be available in the next release. Their release cadence seems unusually slow for such an early stage project. If you can bother to build the feature or master branch, it is very possible that your issue is already fixed.