How to execute .sql file

Hi, I am just getting started with using databases in Julia and am confused how I could read SQL commands from a file (for example, test.sql as below). I am using DuckDB.jl but the question is generic across SQLite.jl, etc. Is there some function that does something like the .read command of SQLite, for example?

CREATE TABLE tab1 (
    _id INTEGER PRIMARY KEY,
    col1 TEXT
);

CREATE TABLE tab2 (
    _id INTEGER PRIMARY KEY,
    col1 TEXT
);
1 Like

You can read any text file from disk by doing something like the following.

sql = read("create.sql", String)

You can send that sql string to SQLite just like you would any other SQL query.

using SQLite
db = SQLite.DB("test.db")
result = DBInterface.execute(db, sql)

If you think you’ll be doing this a lot, you could make a function out of it.

using DataFrames
function read_sql(db::SQLite.DB, filename::AbstractString)
    sql = read(filename, String)
    return DBInterface.execute(db, sql) |> DataFrame
end

And you could use it like this:

julia> df = read_sql(db, "count.sql")
1Ă—1 DataFrame
 Row │ COUNT(*) 
     │ Int64    
─────┼──────────
   1 │    18657

In this case, count.sql contained “SELECT COUNT(*) FROM schedule;” which makes sense in the context of one of my local SQLite databases.

Do you get it?

I haven’t used DuckDB before, but you could probably do something very similar for it.

4 Likes

Thanks @g-gundam it makes sense in general.

Unfortunately in my specific case, using the small test I provided I got the ERROR: Invalid Input Error: Cannot prepare multiple statements at once! error. Do the interfaces for Julia databases have something similar to executescript in Python’s SQLite API that allows one to execute all statements in a .sql file?

https://juliadatabases.org/DBInterface.jl/dev/#DBInterface.executemultiple

There is a DBInterface.executemultiple, but I discovered that it doesn’t work for SQLite. If the sql file contains multiple SQL queries, it just runs the first one. I don’t know about DuckDB. If executemultiple doesn’t work out for you, I would split the sql on ; and loop through it myself.

function read_sql(db, filename)
    sql = read(filename, String)
    queries = filter(q -> isnothing(match(r"^\s*$", q)), split(sql, ";"; keepempty=false))
    results = []
    for q in queries
        @info q
        res = DBInterface.execute(db, q)
        push!(results, res)
    end
    return results
end

That’s all the help I got in me today.

1 Like

You could do

run(pipeline(`cat statements.sql`, `sqlite3 test.db`))

but on Windows it would be type instead of cat.

3 Likes

Thanks all! I tested both SQLite and DuckDB using execute and executemultiple and they both give the same error.

using DuckDB
sql = read("./test.sql", String)
db = DuckDB.DB("test.db")
result = DBInterface.executemultiple(con, sql)

using SQLite
sql = read("./test.sql", String)
db = SQLite.DB("test.db")
result = DBInterface.executemultiple(con, sql)

I believe that in DuckDB’s case at least, the Python API had this similar issue until this feature was added by PR so I opened a feature request on their GitHub: Julia API should be able to execute multiple statements in one API call · duckdb/duckdb · Discussion #15264 · GitHub

This is already implemented in DuckDB.query()

2 Likes

Ah, I see. Thanks, I was only looking at the docs for DBInterface.jl

Does this mean there is some shortcoming in DBInterface.executemultiple, given it does not execute multiple statements for either SQLite or DuckDB? I’m not very familiar with the Julia Databases ecosystem.

It’s currently not possible for SQLite.jl:

A workaround is to execute single statement sequentially.
We can also implement DBInterface.executemultiple correctly with the method described in SQLite forum which is similar with executescript in Python.
All C API functions in SQLite have been exposed now so you can propose a PR if you are interested in it.

It seems rather simple to implement, not done directly by SQLite for good reasons, and I think I located how here the loop that must be implemented (note the tail part, in other contexts I see NULL there):

https://www.sqlite.org/capi3ref.html#sqlite3_prepare

The preferred routine to use is sqlite3_prepare_v2(). The sqlite3_prepare() interface is legacy and should be avoided. sqlite3_prepare_v3() has an extra “prepFlags” option that is used for special purposes.

As you noticed, it’s by design in databases to only run one query (at a time), e.g. in general you want to retrieve the query results, and it can also be a security risk to allow more than one (if you think you have and someone SQL injected more).

That said, there are tools for e.g. bulk loading data for databases like for PostgreSQL, and I guess SQLite too, just not too familiar with it.

A workaround is using the shell, and the tool for this, as I see already suggest here, or using a different database where executemultiple API call is implemented, or implement it for SQLite.jl.

Another workaround is you can use some API in another language (if you like it more than from the shell, possibly easier for cross-platform code), such as in Python, with PythonCall.jl, since I know it implemented there, or I suppose some Rust API has this.

2 Likes

Right. Typically something gets inserted into a database with such SQL scripts which are out in the wild. In database jargon a “query” includes INSERT statements, though nothing gets “queried”.

An executemultiple would not be difficult to implement, but not entirely so. Quoted semicolons and new lines inside strings can occur, error codes need to be checked. Therefore I would recommend to just pipe a mulit-statement script into the sqlite3 CLI.

1 Like

No, you’re thinking of parsing, that would be nontrivial yes, but if you look at the loop in Python I pointed to, you see you don’t need to find the end of each statement, sqlite3_prepare_v2 does it for you and gives you a pointert to the next statement with tail, so it’s a rather trivial loop.

Right, that’s why SQL in not (just) a query language, despite the Q meaning that… Actually it’s neither structured [programming or procedural language, as opposed to “structured English”], query, or a language…

That said even inserts can return values:

https://www.sqlite.org/lang_returning.html

SQLite’s syntax for RETURNING is modelled after PostgreSQL.

I.e. I’m not sure this is in the SQL standard, seemingly not, but a useful extension:

Use of RETURNING avoids performing an extra database query to collect the data, and is especially valuable when it would otherwise be difficult to identify the modified rows reliably.

Off topic:
SQL is not a language, as in it has many variant languages/proprietary extensions (and the official SQL has many official extensions: SQL-86 … SQL-92 … SQL:2023) and:

please also recognize that SQL is not a language like you would think of an object-oriented language or procedural language. In many ways the ANSI SQL standard is more like a protocol

Yes, it is true that SQL does not stand for Structured Query Language. SQL was originally developed by IBM in the 1970s. It originated before the concept of a structured language was developed as a defense against the horrors of “spaghetti code.” Unstructured languages allowed you to jump from one place in a program to another, usually with a GO TO command. SQL, originally named SEQUEL contained that jumping capability and still does to this day with:

WHENEVER GO TO ;

SEQUEL was an acronym for Structured English QUEry Language. This clever name was used because SEQUEL statements were very much like English-language sentences, but were more highly structured. It was structured English. SEQUEL was also a query language. However, it was not a structured query language. IT was not a structured language of any kind. It was and it remains an unstructured query language. The idea that SQL is an acronym for Structured Query Language was retconned onto the language by people who did not know the history and who made the assumption that structured query language is what the letters SQL must stand for. While in the SEQUEL acronym S stood for Structured, Q stood for Query, and L stood for Language, that is not what those letters stand for in SQL. In fact they don’t stand for anything, just like C does not stand for anything in the C language.

1 Like