Status of Julia database libraries

ScottPJones · June 25, 2018, 3:52pm

This is a point I keep coming back to (in many different areas).
A good upfront design/architecture always beats after the fact optimizations.

That’s really the not-so-secret “secret” to Julia’s success IMO, in programmer performance (productivity), in meeting or beating C/C++ performance at run-time, etc.
The design from the co-creators is responsible for all of that.
Now it’s a matter for all of use to help contribute in fleshing out the many different possibilities that that enables.
It’s too big of a job for just JC, quite frankly (no matter how talented they all are).

ScottPJones · June 25, 2018, 3:58pm

I agree that it’s a good starting point, and I’d submit it’s critical to have.
However, while I feel it’s good enough for casual database use, but will never allow Julia applications to be competitive with other applications (non-JVM ones) that have some native access to a database.
Also, not everything is a good fit with relational, tabular databases, so we should also have solid support for “NoSql” databases.

ScottPJones · June 25, 2018, 4:27pm

It has nothing to do with being “cool”. I worked for ages on something that was already considered a “legacy” technology when I started in 1986 - but in the end - what was “legacy” became “NoSQL” or “post-relational” (having hierarchical key/value + object storage [without ORM layer needed] + full SQL), and is prominently figured in analyst groups such as Gartner’s reports.

While those techniques you mentioned to mitigate those issues exist (or are on their way), that doesn’t seem like a very productive path to follow, unless you are trying to interface to something that is implemented in the JVM “ecosystem”.

I’ve learned to be very pragmatic about what tools I use - and using JDBC (from a non-JVM environment) has never been fast enough for what I was trying to accomplish (even though I was a big fan of both Java and later on JDBC when they first came out).

lobingera · June 26, 2018, 4:49pm

Just for my curiosity, where is the overlap of “including the development of Julia 1.0 and Pkg3” and “definitely research”?

StefanKarpinski · June 26, 2018, 4:52pm

I’m not quite sure what you’re asking. If those qualify as “definitely research”?

lobingera · June 26, 2018, 5:58pm

So, where is the research in Pkg3? From the outside this ‘just’ looks like SW engineering. Is the grant request public?

tobias.knopp · June 26, 2018, 6:52pm

A grant does not necessarily need to be a research grant. There are different ways of funding.

tobias.knopp · June 26, 2018, 6:55pm

(Although it is a pretty interesting discussion if open source development in many cases deserves public funding, currently this is often done to “support” research implying that only research is what public money should be spend on. Open source on the other hand can be similar impactful than many research projects)

StefanKarpinski · June 26, 2018, 9:10pm

Exactly. Julia 1.0 and Pkg3 work wasn’t funded as research, it was funded by grants from the Moore and Sloan Foundations and from Invenia. These foundations support development of many pieces of open source scientific infrastructure whether that work would be considered research by academics or not. In fact, it is often precisely because some work is important for the advancement of science yet not something that academia considers worthy of the research title, that they are interested in funding it.

bluesmoon · June 27, 2018, 3:31pm

We’ve written an internal DBI library that is an abstract interface to database drivers, and we’ve also created a PyDBConn driver that provides an interface for wrapping database libraries written in python using PyCall. Using this, we’ve implemented wrappers for 3 python database libraries: pyodbc, pygresql and python-snowflake-connector

It works really well for us, and we use a database for everything we do. There was some mangling needed to convert types efficiently since python arrays are row based while julia arrays are column based.

If there’s interest, I can see what it would take for us to opensource it. The only caveat is that it currently only works on Julia 0.4, and until https://github.com/JuliaLang/julia/pull/27020 is merged in, it won’t work in julia 0.6.

ScottPJones · June 27, 2018, 5:05pm

I’m surprised that you’re still on v0.4! Any particular reason for that?

This sounds useful - also, in later versions of Julia, with PermutedDims, might be easier to handle the row-based Python arrays.

What about v0.7/master? Could you just skip right to that, presuming that #27020 is fixed there?

bluesmoon · June 27, 2018, 5:30pm

we’re still on 0.4 because it’s insanely hard to migrate a large codebase with lots of customers who are averse to change. Anything we write for 0.6 must also work with 0.4 without any warnings.

I’d be happy to move to 0.7, but it’s even harder to move from 0.4 to 0.7. At the moment all the work we’ve been doing to migrate from 0.4 to 0.6 has been shelved because our customers want features and don’t care what’s running underneath, and we only have time to do either features or migration, not both.

ScottPJones · June 27, 2018, 5:42pm

Color me impressed!

There, I’d recommend simply making a clean break, things are so different, and branch off a v0.4 based version of you product, and then simply change everything that needs to be changed for v0.7/v1.0 without worrying about the past.
I know that maintaining compatibility even between v0.6 and master has been rather difficult for me, with a large codebase (all of the stuff in JuliaString.org), it would have been a lot easier to simply say, this is only for v1.0 when it is out.

sherrinm · July 1, 2018, 2:03pm

In over 40 years (sadly) I’ve crawled out from under a number of rocks, and one thing I’ve noticed is that the rocks keep changing size and shape.

The state of database support in Julia is one of the things I keep using as an excuse to not get on with a second edition, the others being version 1.0 and waiting until after JuliaCon

Adrian (@essenciary) is entirely correct in highighting the lack of sensible database support in Julia. Of course it is possible to use JDBC and ODBC, although the latter is flagged as not working with v0.6; and not being touch for over 6 months, it is also possible to use PyCall and Python drivers; but really (?) is that what Julia database support is all about.

There is nothing wrong with writing wrapper packages around DBMS shared libraries, Julia has a long history of such and its architecture make it particularly suited. That is not the same as having to install a framework such as the JVM, ODBC or Python with appropriate drivers. Indeed this is not easy as perhaps @Steven_Sagaert thinks when hosting websites on many commercial providers.

I’ve worked on large commercial sites, often Java based, but even there would stil not like to propose a Julia/JDBC solution to my now youngers and betters (paid at least). Julia Computing are righlty touting a financial package JuliaFin, however the lack of native connection to Oracle seems to me very bizarre.

Also I’m concerned that the NoSQL support in Julia for databases such as Mongo, Redis, Neo4J as almost entirely dried-up. One could say that there is always REST support, does that argument sound familiar?

So I applaud Adrian’s posting and I am quite willing to assist in any tetsing etc., perhaps those interested and I can get together at JuliaCon for a beer - as a Londoner I at least know of a few good pubs in the area.

jlapeyre · July 1, 2018, 4:20pm

For comparison, I looked into the timeline for database support in golang and in python.

Here is a thread about the woeful state of database support in golang. Including an unfavorable comparison to … JDBC. (although the suggestion was not to wrap the java interface) This thread was started two years after the release of Go1 (by Google).

database/sql: Support for database, catalog, schema and table level metadata · Issue #7408 · golang/go · GitHub

The situation for python was much worse for a few years after the release of 1.0.0. But the http protocol was only two years old at the time of this release. The world was much smaller. So, its not the best comparison.

EDIT: The following post from the linked thread is especially relevant. It opens:

Julia had the promise of becoming the next multi-platform tool of choice with, in my opinion, so many advantages over Python and R…

Nah… just kidding. This is the actual opening sentence:

Golang had the promise of becoming the next multi-platform tool of choice with, in my opinion, so many advantages over Java

(my emphasis). The implication is that Golang will not live up to its promise.

github.com/golang/go

database/sql: Support for database, catalog, schema and table level metadata

opened 10:08AM - 25 Feb 14 UTC

gopherbot

by **glen.newton**: <pre>database/sql does not offer the ability to dynamically… peruse databases, catalogs, schemas and tables and their underlying metadata at runtime. Without this, it is not possible to make, for example, a Go program that can copy arbitrary tables from a database, by examining their metadata at runtime. What is needed is the equivalent of Java JDBC's DatabaseMetaData <a href="http://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html">http://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html</a> While JDBC's DatabaseMetaData may seem be overkill, being able to examine a table's metadata in the same manner as the table's data is attractive. In order to support rich and complex interactions with sql databases, emulating JDBC would not be a bad idea. As an initial step however, I would suggest the following minimum extensions to the sql package: func (db *DB) GetSchemas() (*Rows, error) JDBC equivalent & explanation: ResultSet getSchemas() <a href="http://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getSchemas%28%29">http://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getSchemas%28%29</a> func (db *DB) GetCatalogs() (*Rows, error) JDBC equivalent & explanation: ResultSet getCatalogs() <a href="http://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getCatalogs%28%29">http://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getCatalogs%28%29</a> func (db *DB) GetTables(catalog String, schemaPattern String, tableNamePattern String, types String[]) (*Rows, error) JDBC equivalent & explanation: ResultSet getTables(String catalog, String schemaPattern, String tableNamePattern, String[] types) <a href="http://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getTables%28java.lang.String,%20java.lang.String,%20java.lang.String,%20java.lang.String">http://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getTables%28java.lang.String,%20java.lang.String,%20java.lang.String,%20java.lang.String</a>[]%29 func (db *DB) getColumns(catalog String, schemaPattern String, tableNamePattern String, columnNamePattern String) (*Rows, error) JDBC equivalent & explanation: ResultSet getColumns(String catalog, String schemaPattern, String tableNamePattern, String columnNamePattern) <a href="http://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getColumns%28java.lang.String,%20java.lang.String,%20java.lang.String,%20java.lang.String%29">http://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getColumns%28java.lang.String,%20java.lang.String,%20java.lang.String,%20java.lang.String%29</a> The following are important and should also be considered: getAttributes(...) <a href="http://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getAttributes(java.lang.String,%20java.lang.String,%20java.lang.String,%20java.lang.String)">http://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getAttributes(java.lang.String,%20java.lang.String,%20java.lang.String,%20java.lang.String)</a> getCrossReference(...) <a href="http://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getCrossReference%28java.lang.String,%20java.lang.String,%20java.lang.String,%20java.lang.String,%20java.lang.String,%20java.lang.String%29">http://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getCrossReference%28java.lang.String,%20java.lang.String,%20java.lang.String,%20java.lang.String,%20java.lang.String,%20java.lang.String%29</a> getIndexInfo(...) <a href="http://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getIndexInfo(java.lang.String,%20java.lang.String,%20java.lang.String,%20boolean,%20boolean)">http://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getIndexInfo(java.lang.String,%20java.lang.String,%20java.lang.String,%20boolean,%20boolean)</a> getPrimaryKeys(...) <a href="http://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getPrimaryKeys(java.lang.String,%20java.lang.String,%20java.lang.String)">http://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getPrimaryKeys(java.lang.String,%20java.lang.String,%20java.lang.String)</a> I would also suggest that bug <a href="https://golang.org/issue/5606">https://golang.org/issue/5606</a> be solved in a similar and consistent fashion with something like <a href="http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSetMetaData.html">http://docs.oracle.com/javase/7/docs/api/java/sql/ResultSetMetaData.html</a> Apologies for using Java as an example, but JDBC has done its homework and I think it represents best practices. How it maps into a Go context is of course to be debated and I am willing to accept that my very direct mapping suggestion may not be the best or the most acceptable to the community.</pre>

ExpandingMan · July 23, 2018, 11:22pm

Currently this is very much being overshadowed by the massive upgrade effort to Julia 0.7. I suspect that pushing this too hard before things have settled down a bit would be counterproductive.

As I’ve said, there shouldn’t be too much code involved in actually implementing this, the difficulty is in getting all the package maintainers on the same page. In principle I don’t even think that this would necessarily involve a separate package, though that may make it more difficult to get people to design uniform interfaces.

As for me my concrete plans are to create a common interface for JDBC.jl and LibPQ.jl (essentially PEP249) as this is something I will definitely need, but again this is overshadowed by the 0.7 transition for me, which has still yet to occur for the most critical code I use in my job. JDBC.jl and LibPQ.jl already have very similar interfaces, so again, this won’t involve very much actual code.

lewis · August 30, 2018, 3:49am

Rather old post, but I created a 501(3)c in about 3 months. There is a formation agreement; filing with the IRS for an EIN, and maybe filing for a state business license. You likely need an accountant because even as a non-profit you need to file 990’s to confirm you follow the rules to remain a non-profit.

Bureaucratic to be sure, but not that long or expensive. It’s mostly cookie-cutter. Actually running it–now that’s hard: getting funding and doing work.

shishini · May 27, 2019, 4:25pm

Sorry for reviving this topic

But I am very new Julia, and working with databases is essential for me
Have the situation changed recently?

Is there any effort going into this, any projects on github?

ValdarT · May 27, 2019, 5:28pm

The situation could definitely be better but whether that’s an issue for you or not depends on what you need to do exactly.

I’m not aware of any coordinated effort but there is the JuliaDatabases GitHub org with several packages of varying quality as well as some C-library wrappers elsewhere such as LibPQ and Mongoc.

From personal experience, LibPQ and JDBC have worked well for me while ODBC was full of bugs the last I tried it.

But note that JavaCall and hence JDBC doesn’t work on Julia versions greater than 1.0.x currently (but do work on v1.0.x).

essenciary · May 28, 2019, 8:34am

As the creator of this thread and maintainer of the SearchLight ORM (which supports SQLite [via SQLite.jl], MySQL [via MySQL.jl], and Postgres [via LibPQ.jl]) I can confirm that accessing these databases definitely works. They are all functional and active open source projects. In general, since these adapters provide low-level query APIs, you can perform a lot of actions. More advanced features not related to querying will most likely be unsupported (like Postgres pub-sub, support for JSON types, etc).

That’s assuming you need relational databases support - which is the topic of the thread. If you’re looking for NoSQL or key-value you need to look in other places. Last time I checked (some 6 months ago), the Mongo library was still not functional. There are adapters for key-value stores like Redis and Memcache, but I haven’t used these recently so I can’t comment.

It entirely depends on your use case. What volume of operations are you looking at? From my experience with the 3, low volume operations are OK, but once you take things up a few notches, problems start showing. This is a reflection of the fact that it still is young beta software which hasn’t seen many high volume production deployments. My most extreme use case was importing some tens of millions of rows from CSV files. This caused SQLite to segfault every few tens of thousands of operations. MySQL had (and still has) a problem with a finalizer error which outputs to screen - which caused massive output to the terminal screen, leading to the whole app to slow down to a halt after some tens of thousands of queries.

Things have improved considerably since I raised the issue. For a data science kind of project where you need to perform standard ETL operations, you’ll most likely be fine. For something like a high volume, low-latency web application backend… well, I encourage you to try it cause we need production tested libraries, but expect that you (and your team) will most likely have to contribute and improve things.

Topic		Replies	Views
PostgreSQL.jl no longer works in v0.6 Web Stack	27	3481	March 26, 2018
Status of SearchLight.jl Data	1	755	July 20, 2023
State of the Julia ecosystem General Usage question , package	26	2999	October 20, 2017
Database Poll Community	1	653	May 29, 2019
Status of database packages and tools for, e.g. ORM tools General Usage	1	1154	November 8, 2022

Status of Julia database libraries

Related topics