A prototype for a minimalistic universal database API

Hello all, I have written an extremely minimal universal database API in the style of PEP249. It is so minimal, in fact, that by design you should not even have to import this package to implement it.

My goal was to provide an interface which allows you to change database back-ends by only changing your connection string and a single using or import statement. The interface should be able to:

  • Connect to the database using only a connection AbstractString.
  • Execute a query from an AbstractString containing SQL code.
  • Push or pull data using DataStreams.

Note that I have no intention of having any sort of SQL-like domain-specific query language.

You can find the interface here and an example implementation in JDBC here. (I’m probably still missing some important bits, but a large class of transactions should be possible with what is already here.)

Here is an example query using JDBC:

# create connection
cnxn = Connection("jdbc:derby:jar:(toursdb.jar)toursdb")
# create a cursor
cursor(cnxn)
# execute a query
execute!(csr, "select * from airlines")
# load into DataFrame or any other DataStreams sink
df = load(DataFrame, csr)

# or you can skip most of the above steps and go straight to
df = load(DataFrame, cursor(cnxn), "select * from airlines")

This is of course very close to the existing database interfaces, so in most cases it will take very little work to get them to implement this.

I will have use of this for JDBC and Postgres, so I will commit to JDBC.jl and create a PostgreSQL.jl which adds this simple interface over LibPQ.jl. (I will contact the maintainers of the JuliaDatabases repositories to see if they are willing to let me use those, I am already an admin on JDBC.)

Things I’d still like to do:

  • Uploading to database (using DataStreams sink objects).
  • Possibly clean up or standardize object constructors so that users of the interface won’t have to change constructor code when swapping out back-ends (this is already possible, but I have to make some choices).
  • I think I’d like to make a standard toml format so that people could just stick a toml in their project with the appropriate connecton strings and instantly have an easy interface for their various databases.
  • Somewhat more ambitious, automatically select a back-end based on configuration toml. This is a little iffy because it would involve conditional loading or code-generation.

Any feedback is welcome. On the other thread there seemed to be a lot of interest in this, but I want to see if it is still there. My own situation has improved somewhat and increasingly I am only having to use Postgres, so the usefulness of this for my own personal case has diminished somewhat.

3 Likes

One thought on the standard TOML format you envision for a very generic way of being quite specific about one’s database application, and some of the time: project, datastore path[s], personal identifier, and one’s credentials. and more like that.

If this effort had started in 2009, the format would have been very helpful to and successful for users/designers of the major RDBMS systems. Months later, the NOSQL variants become widely used, as do (to a lesser extent) a temporal database and a hierarchical database. And let us not omit PROLOG … it does what it does by being both a DB and a Horn clause resolver.

So design the format as a flexible yet logically consistent accommodation to the ways we will want to find it when using approaches/connectivities that, on this Sunday night, remain unwritten.

It is so minimal I would not call it DPAPI. Its usability is very limited (I don’t say it is unusable!).

For example something like fetchone or rows iterator from old DBAPI and many more are IMHO necessary.

I see there is rows implemented in JDBC, it is useful so why not to export it into specification? (BTW isn’t better to allocate row only one time in JDBCRowIterator struct and not many times in Base.iterate?)

What about to define iterate protocol on Cursor itself?

I am also thinking about using NamedTuple instead of tuple as return type from Base.iterate. Would it be possible and without performance penalty?

I would like to see also something like cursor description to see column names and types returned from for example execute!(cursor, "select * from table").

And I think good DBAPI has to define also optional specifications to enhance future compatibility (why not to borrow for example two phase commit functionality?).

You are referring to the much more difficult problem of actually transferring data which is already solved by DataStreams. So, for example, with this you would just do

Data.schema(source(csr))

on the cursor, and you would have an object with all of the metadata that is available.

Again, same thing.

Thanks for pointing that out, that is exactly the sort of thing which I would like for it to be possible to extend this to support, even if nothing is written to do so initially. I’ll look into it more and try to make sure I don’t get in its way in the next iteration.

So, you may ask, why bother with this at all of I’m just saying “I’m not going to do any work, DataStreams does it”? Well, as things are right now there is no guarantee of how you get to the point of actually having a Data.Source or Data.Sink. This becomes a problem if you have a dozen different tables living in 3 or 4 different types of databases you will probably have to devote some time to figuring out how to do all the queries. Additionally, if you have a table that gets moved to a different source on a different database (this is something that has actually happened to me a number of times), you will have to actually go in and change your source code to accommodate this. If I do this right, these are the problems that should be solved by this interface.

Agreed, I don’t particularly like the name either, it was taken from the previously existing JuliaDatabases/DBAPI.jl package. Perhaps something like DBStreams, to emphasize that it is simply a hook into DataStreams.

Shouldn’t it be close!() (with exclamation mark) as it is mutating?