DuckDB.jl and S3?

I noticed to my delight that there is a DuckDB.jl package for DuckDB.

I now want to read in some Parquet data from an in-house S3 storage. According to the documentation I need the httpfs extension for DuckDB, but how should I run the INSTALL 'fts'; command?

This is not the right way to do it:

julia --project=. --threads=32
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.9.0 (2023-05-07)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> using DuckDB

julia> con = DBInterface.connect(DuckDB.DB, ":memory:")
DuckDB.DB(":memory:")

julia> # list extensions
       DBInterface.execute(con, "select * from duckdb_extensions()")
13Γ—6 DataFrame
 Row β”‚ extension_name    loaded  installed  install_path  description                        aliases                           
     β”‚ String?           Bool?   Bool?      String?       String?                            Array…?                           
─────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 β”‚ autocomplete       false      false                Add supports for autocomplete in…  Union{Missing, String}[]
   2 β”‚ fts                false      false                Adds support for Full-Text Searc…  Union{Missing, String}[]
   3 β”‚ httpfs             false      false                Adds support for reading and wri…  Union{Missing, String}["http", "…
   4 β”‚ icu                 true       true  (BUILT-IN)    Adds support for time zones and …  Union{Missing, String}[]
   5 β”‚ inet               false      false                Adds support for IP-related data…  Union{Missing, String}[]
   6 β”‚ jemalloc            true       true  (BUILT-IN)    Overwrites system allocator with…  Union{Missing, String}[]
   7 β”‚ json               false      false                Adds support for JSON operations   Union{Missing, String}[]
   8 β”‚ motherduck         false      false                Enables motherduck integration w…  Union{Missing, String}["md"]
   9 β”‚ parquet             true       true  (BUILT-IN)    Adds support for reading and wri…  Union{Missing, String}[]
  10 β”‚ postgres_scanner   false      false                Adds support for reading from a …  Union{Missing, String}["postgres…
  11 β”‚ sqlite_scanner     false      false                Adds support for reading SQLite …  Union{Missing, String}["sqlite",…
  12 β”‚ tpcds              false      false                Adds TPC-DS data generation and …  Union{Missing, String}[]
  13 β”‚ tpch               false      false                Adds TPC-H data generation and q…  Union{Missing, String}[]

julia> DBInterface.execute(con, "INSTALL httpfs")
ERROR: Execute of query "INSTALL httpfs" failed: HTTP Error: Failed to download extension "httpfs" at URL "http://extensions.duckdb.org/ae9bcc347c/linux_amd64/httpfs.duckdb_extension.gz"
Extension "httpfs" is an existing extension.

Are you using a development build? In this case, extensions might not (yet) be uploaded.
Stacktrace:
 [1] execute(stmt::DuckDB.Stmt, params::NamedTuple{(), Tuple{}})
   @ DuckDB ~/.julia/packages/DuckDB/PMwHV/src/result.jl:733
 [2] execute
   @ ~/.julia/packages/DuckDB/PMwHV/src/result.jl:834 [inlined]
 [3] execute
   @ ~/.julia/packages/DBInterface/1Gmxx/src/DBInterface.jl:130 [inlined]
 [4] #execute#2
   @ ~/.julia/packages/DBInterface/1Gmxx/src/DBInterface.jl:152 [inlined]
 [5] execute(conn::DuckDB.DB, sql::String)
   @ DBInterface ~/.julia/packages/DBInterface/1Gmxx/src/DBInterface.jl:152
 [6] top-level scope
   @ REPL[4]:1

julia> 

That error is telling you it is already installed. You just need to load it.

How do I load it?

julia> DBInterface.execute(con, "LOAD httpfs")
ERROR: Execute of query "LOAD httpfs" failed: IO Error: Extension "/home_volume/arihuttunen/.duckdb/extensions/ae9bcc347c/linux_amd64/httpfs.duckdb_extension" not found.
Extension "httpfs" is an existing extension.

Install it first using "INSTALL httpfs".
Stacktrace:
 [1] execute(stmt::DuckDB.Stmt, params::NamedTuple{(), Tuple{}})
   @ DuckDB ~/.julia/packages/DuckDB/PMwHV/src/result.jl:733
 [2] execute
   @ ~/.julia/packages/DuckDB/PMwHV/src/result.jl:834 [inlined]
 [3] execute
   @ ~/.julia/packages/DBInterface/1Gmxx/src/DBInterface.jl:130 [inlined]
 [4] #execute#2
   @ ~/.julia/packages/DBInterface/1Gmxx/src/DBInterface.jl:152 [inlined]
 [5] execute(conn::DuckDB.DB, sql::String)
   @ DBInterface ~/.julia/packages/DBInterface/1Gmxx/src/DBInterface.jl:152
 [6] top-level scope
   @ REPL[5]:1

There could be a problem with the extension itself. I assume this should be able to download it?

wget http://extensions.duckdb.org/ae9bcc347c/linux_amd64/httpfs.duckdb_extension.gz
--2023-05-12 13:23:57--  http://extensions.duckdb.org/ae9bcc347c/linux_amd64/httpfs.duckdb_extension.gz
Resolving extensions.duckdb.org (extensions.duckdb.org)... 108.156.22.40, 108.156.22.33, 108.156.22.54, ...
Connecting to extensions.duckdb.org (extensions.duckdb.org)|108.156.22.40|:80... connected.
HTTP request sent, awaiting response... 403 Forbidden
2023-05-12 13:23:58 ERROR 403: Forbidden.

Yea it seems like perhaps they have moved them? They are pretty active. I’ll make an issue.

1 Like

I will say that I am enjoying using DuckDB but less-so with the Julia client. The CLI and the Python client are getting all of the love, so I have been using those for now.

Told you they were active. They’ve already responded:

Thanks for the report! The latest version of the Julia client is currently on a different version which does not have extensions available unfortunately. We will resolve this when we release v0.8.0 next week.

2 Likes

Excellent. I tried a hack by installing the extension with Python, and copying it, which gives this error that’s consistent with what you just said.

..extension..  version (v0.7.1) does not match DuckDB version (v0.7.2-dev1987)

I’ll just use the Python version for a while then.

Yea it is a bit roundabout, but you can use Julia β†’ PythonCall β†’ DuckDB. Hopefully that isn’t always the case.

At least we got some more ground-level visibility for Julia there. :julia:

1 Like