DuckDB.jl and S3?

I noticed to my delight that there is a DuckDB.jl package for DuckDB.

I now want to read in some Parquet data from an in-house S3 storage. According to the documentation I need the httpfs extension for DuckDB, but how should I run the INSTALL 'fts'; command?

This is not the right way to do it:

julia --project=. --threads=32
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.9.0 (2023-05-07)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using DuckDB

julia> con = DBInterface.connect(DuckDB.DB, ":memory:")
DuckDB.DB(":memory:")

julia> # list extensions
       DBInterface.execute(con, "select * from duckdb_extensions()")
13Γ—6 DataFrame
 Row β”‚ extension_name    loaded  installed  install_path  description                        aliases                           
     β”‚ String?           Bool?   Bool?      String?       String?                            Array…?                           
─────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 β”‚ autocomplete       false      false                Add supports for autocomplete in…  Union{Missing, String}[]
   2 β”‚ fts                false      false                Adds support for Full-Text Searc…  Union{Missing, String}[]
   3 β”‚ httpfs             false      false                Adds support for reading and wri…  Union{Missing, String}["http", "…
   4 β”‚ icu                 true       true  (BUILT-IN)    Adds support for time zones and …  Union{Missing, String}[]
   5 β”‚ inet               false      false                Adds support for IP-related data…  Union{Missing, String}[]
   6 β”‚ jemalloc            true       true  (BUILT-IN)    Overwrites system allocator with…  Union{Missing, String}[]
   7 β”‚ json               false      false                Adds support for JSON operations   Union{Missing, String}[]
   8 β”‚ motherduck         false      false                Enables motherduck integration w…  Union{Missing, String}["md"]
   9 β”‚ parquet             true       true  (BUILT-IN)    Adds support for reading and wri…  Union{Missing, String}[]
  10 β”‚ postgres_scanner   false      false                Adds support for reading from a …  Union{Missing, String}["postgres…
  11 β”‚ sqlite_scanner     false      false                Adds support for reading SQLite …  Union{Missing, String}["sqlite",…
  12 β”‚ tpcds              false      false                Adds TPC-DS data generation and …  Union{Missing, String}[]
  13 β”‚ tpch               false      false                Adds TPC-H data generation and q…  Union{Missing, String}[]

julia> DBInterface.execute(con, "INSTALL httpfs")
ERROR: Execute of query "INSTALL httpfs" failed: HTTP Error: Failed to download extension "httpfs" at URL "http://extensions.duckdb.org/ae9bcc347c/linux_amd64/httpfs.duckdb_extension.gz"
Extension "httpfs" is an existing extension.

Are you using a development build? In this case, extensions might not (yet) be uploaded.
Stacktrace:
 [1] execute(stmt::DuckDB.Stmt, params::NamedTuple{(), Tuple{}})
   @ DuckDB ~/.julia/packages/DuckDB/PMwHV/src/result.jl:733
 [2] execute
   @ ~/.julia/packages/DuckDB/PMwHV/src/result.jl:834 [inlined]
 [3] execute
   @ ~/.julia/packages/DBInterface/1Gmxx/src/DBInterface.jl:130 [inlined]
 [4] #execute#2
   @ ~/.julia/packages/DBInterface/1Gmxx/src/DBInterface.jl:152 [inlined]
 [5] execute(conn::DuckDB.DB, sql::String)
   @ DBInterface ~/.julia/packages/DBInterface/1Gmxx/src/DBInterface.jl:152
 [6] top-level scope
   @ REPL[4]:1

julia> 

That error is telling you it is already installed. You just need to load it.

How do I load it?

julia> DBInterface.execute(con, "LOAD httpfs")
ERROR: Execute of query "LOAD httpfs" failed: IO Error: Extension "/home_volume/arihuttunen/.duckdb/extensions/ae9bcc347c/linux_amd64/httpfs.duckdb_extension" not found.
Extension "httpfs" is an existing extension.

Install it first using "INSTALL httpfs".
Stacktrace:
 [1] execute(stmt::DuckDB.Stmt, params::NamedTuple{(), Tuple{}})
   @ DuckDB ~/.julia/packages/DuckDB/PMwHV/src/result.jl:733
 [2] execute
   @ ~/.julia/packages/DuckDB/PMwHV/src/result.jl:834 [inlined]
 [3] execute
   @ ~/.julia/packages/DBInterface/1Gmxx/src/DBInterface.jl:130 [inlined]
 [4] #execute#2
   @ ~/.julia/packages/DBInterface/1Gmxx/src/DBInterface.jl:152 [inlined]
 [5] execute(conn::DuckDB.DB, sql::String)
   @ DBInterface ~/.julia/packages/DBInterface/1Gmxx/src/DBInterface.jl:152
 [6] top-level scope
   @ REPL[5]:1

There could be a problem with the extension itself. I assume this should be able to download it?

wget http://extensions.duckdb.org/ae9bcc347c/linux_amd64/httpfs.duckdb_extension.gz
--2023-05-12 13:23:57--  http://extensions.duckdb.org/ae9bcc347c/linux_amd64/httpfs.duckdb_extension.gz
Resolving extensions.duckdb.org (extensions.duckdb.org)... 108.156.22.40, 108.156.22.33, 108.156.22.54, ...
Connecting to extensions.duckdb.org (extensions.duckdb.org)|108.156.22.40|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2023-05-12 13:23:58 ERROR 403: Forbidden.

Yea it seems like perhaps they have moved them? They are pretty active. I’ll make an issue.

1 Like

I will say that I am enjoying using DuckDB but less-so with the Julia client. The CLI and the Python client are getting all of the love, so I have been using those for now.

Told you they were active. They’ve already responded:

Thanks for the report! The latest version of the Julia client is currently on a different version which does not have extensions available unfortunately. We will resolve this when we release v0.8.0 next week.

3 Likes

Excellent. I tried a hack by installing the extension with Python, and copying it, which gives this error that’s consistent with what you just said.

..extension..  version (v0.7.1) does not match DuckDB version (v0.7.2-dev1987)

I’ll just use the Python version for a while then.

Yea it is a bit roundabout, but you can use Julia β†’ PythonCall β†’ DuckDB. Hopefully that isn’t always the case.

At least we got some more ground-level visibility for Julia there. :julia:

2 Likes

This is great, but are you actually able to use the fts extension? If I try to activate it with DBInterface.execute(con, 'PRAGMA create_fts_index(…)') I get the following exception:

ERROR: Invalid Input Error: Cannot prepare multiple statements at once!
Stacktrace:
  [1] DuckDB.Stmt(con::DuckDB.Connection, sql::String, result_type::Type)
  …

(I mean, if I put the DB in a file, I could always just execute the CLI from Julia to get the index in place, but …)

Added an issue, and got a response almost immediately :open_mouth: It seems there’s a fix in the pipeline.

The issue is DBInterface.execute() is dispatched to the duckdb c api function to execute prepared statements, which is an issue for some query strings. Instead we can call the duckdb_query function which is a separate function in the c api. However, I don’t understand how a Julia user would expect to have that additional function mapped to the DBInterface function.

It was merged into the main dev branch today if you want to try it out as DuckDB.query()

I think someone with more DBInterface experience could review if it is appropriate to map it into an appropriate call in DBInterface.

1 Like