Calling database functions when using Octo.jl

I’m trying out Octo.jl with DuckDB. I managed to connect to an existing file and query a table there.

julia> import DataFrames as DF;

julia> using Octo.Adapters.DuckDB

julia> con = Repo.connect(adapter=Octo.Adapters.DuckDB, file="test/data/norse.duckdb")
Octo.Repo.Connection(false, "DuckDB", Main.DuckDBLoader, Octo.Adapters.DuckDB, DuckDB.DB("test/data/norse.duckdb"))

julia> struct Assets end

julia> Schema.model(Assets, table_name="alt_assets", primary_key="name")
| primary_key   | table_name   |
| ------------- | ------------ |
| name          | alt_assets   |

julia> assets = from(Assets)
FromItem alt_assets

julia> [SELECT * FROM assets LIMIT 5]
SELECT * FROM alt_assets LIMIT 5

julia> Repo.query([SELECT * FROM assets LIMIT 5]) |> DF.DataFrame
5Γ—21 DataFrame
 Row β”‚ name             type        active  investable  investment_integer  variable_cost  inv β‹―
     β”‚ String?          String?     Bool?   Bool?       Bool?               Float64?       Flo β‹―
─────┼──────────────────────────────────────────────────────────────────────────────────────────
   1 β”‚ Asgard_Battery   storage       true        true                true         0.003       β‹―
   2 β”‚ Asgard_Solar     producer      true        true                true         0.001
   3 β”‚ Asgard_E_demand  consumer      true       false               false         0.0
   4 β”‚ Asgard_CCGT      conversion    true       false                true         0.0015
   5 β”‚ G_imports        producer      true       false               false         0.08        β‹―

However now I want to query CSV files (the reason for using DuckDB), but I’m unsure how to call database functions like read_csv_auto.

Hoping to get some hints on how to proceed.

I think you could use the Raw() function to wrap your plain SQL command that you want to send to DuckDB, at a minimum.

FWIW, I typically use DBInterface.jl for DuckDB.

Thanks for pointing out Raw(). However I’m wrapping DuckDB for a workflow library for not-so-technical people. I was looking at Octo because I want to replace my current string based implementation. But if I’ve to use Raw, that doesn’t really work.

I also found FunSQL.jl. That seems to have wider feature support. Maybe that’s a better bet.

Have you tried making a View in duckdb that selects from the files that you want to read, which will allow you to specify the arguments to the csv reader and glob file names?

Also if you use a single quote string as the table name to sql it will automatically read the file as a view; for example "select * from 'file.csv'". Can you pass the table_name as "'file.csv'"

1 Like

Interesting point about using the bare filename feature, unfortunately the files I’m reading now needs skip=1, and in the future files can be from any source, so I would need the full flexibility offered by the options. At the moment I format regular Julia keyword arguments to kwd1=value1, kwd2=value2.

@suvayu TidierDB.jl may also be of use to you.

TidierDB.copy_to allows you to copy .json, .csv, .arrow, and .parquet to your database to then be queried via duckdb by way of Tidier.jl syntax.

using TidierDB
db = connect(:duckdb)
path = # can be a website url or  df too
copy_to(db, path, "table_name")

If something is missing or would be valuable, please let me know !

2 Likes