Status of Julia database libraries

This is a point I keep coming back to (in many different areas).
A good upfront design/architecture always beats after the fact optimizations.

That’s really the not-so-secret “secret” to Julia’s success IMO, in programmer performance (productivity), in meeting or beating C/C++ performance at run-time, etc.
The design from the co-creators is responsible for all of that.
Now it’s a matter for all of use to help contribute in fleshing out the many different possibilities that that enables.
It’s too big of a job for just JC, quite frankly (no matter how talented they all are).

1 Like

I agree that it’s a good starting point, and I’d submit it’s critical to have.
However, while I feel it’s good enough for casual database use, but will never allow Julia applications to be competitive with other applications (non-JVM ones) that have some native access to a database.
Also, not everything is a good fit with relational, tabular databases, so we should also have solid support for “NoSql” databases.

It has nothing to do with being “cool”. I worked for ages on something that was already considered a “legacy” technology when I started in 1986 - but in the end - what was “legacy” became “NoSQL” or “post-relational” (having hierarchical key/value + object storage [without ORM layer needed] + full SQL), and is prominently figured in analyst groups such as Gartner’s reports.

While those techniques you mentioned to mitigate those issues exist (or are on their way), that doesn’t seem like a very productive path to follow, unless you are trying to interface to something that is implemented in the JVM “ecosystem”.

I’ve learned to be very pragmatic about what tools I use - and using JDBC (from a non-JVM environment) has never been fast enough for what I was trying to accomplish (even though I was a big fan of both Java and later on JDBC when they first came out).

Just for my curiosity, where is the overlap of “including the development of Julia 1.0 and Pkg3” and “definitely research”?

I’m not quite sure what you’re asking. If those qualify as “definitely research”?

So, where is the research in Pkg3? From the outside this ‘just’ looks like SW engineering. Is the grant request public?

A grant does not necessarily need to be a research grant. There are different ways of funding.

1 Like

(Although it is a pretty interesting discussion if open source development in many cases deserves public funding, currently this is often done to “support” research implying that only research is what public money should be spend on. Open source on the other hand can be similar impactful than many research projects)

1 Like

Exactly. Julia 1.0 and Pkg3 work wasn’t funded as research, it was funded by grants from the Moore and Sloan Foundations and from Invenia. These foundations support development of many pieces of open source scientific infrastructure whether that work would be considered research by academics or not. In fact, it is often precisely because some work is important for the advancement of science yet not something that academia considers worthy of the research title, that they are interested in funding it.

14 Likes

We’ve written an internal DBI library that is an abstract interface to database drivers, and we’ve also created a PyDBConn driver that provides an interface for wrapping database libraries written in python using PyCall. Using this, we’ve implemented wrappers for 3 python database libraries: pyodbc, pygresql and python-snowflake-connector

It works really well for us, and we use a database for everything we do. There was some mangling needed to convert types efficiently since python arrays are row based while julia arrays are column based.

If there’s interest, I can see what it would take for us to opensource it. The only caveat is that it currently only works on Julia 0.4, and until https://github.com/JuliaLang/julia/pull/27020 is merged in, it won’t work in julia 0.6.

6 Likes

I’m surprised that you’re still on v0.4! Any particular reason for that?

This sounds useful - also, in later versions of Julia, with PermutedDims, might be easier to handle the row-based Python arrays.

What about v0.7/master? Could you just skip right to that, presuming that #27020 is fixed there?

we’re still on 0.4 because it’s insanely hard to migrate a large codebase with lots of customers who are averse to change. Anything we write for 0.6 must also work with 0.4 without any warnings.

I’d be happy to move to 0.7, but it’s even harder to move from 0.4 to 0.7. At the moment all the work we’ve been doing to migrate from 0.4 to 0.6 has been shelved because our customers want features and don’t care what’s running underneath, and we only have time to do either features or migration, not both.

2 Likes

Color me impressed!

There, I’d recommend simply making a clean break, things are so different, and branch off a v0.4 based version of you product, and then simply change everything that needs to be changed for v0.7/v1.0 without worrying about the past.
I know that maintaining compatibility even between v0.6 and master has been rather difficult for me, with a large codebase (all of the stuff in JuliaString.org), it would have been a lot easier to simply say, this is only for v1.0 when it is out.

1 Like

In over 40 years (sadly) I’ve crawled out from under a number of rocks, and one thing I’ve noticed is that the rocks keep changing size and shape.

The state of database support in Julia is one of the things I keep using as an excuse to not get on with a second edition, the others being version 1.0 and waiting until after JuliaCon

Adrian (@essenciary) is entirely correct in highighting the lack of sensible database support in Julia. Of course it is possible to use JDBC and ODBC, although the latter is flagged as not working with v0.6; and not being touch for over 6 months, it is also possible to use PyCall and Python drivers; but really (?) is that what Julia database support is all about.

There is nothing wrong with writing wrapper packages around DBMS shared libraries, Julia has a long history of such and its architecture make it particularly suited. That is not the same as having to install a framework such as the JVM, ODBC or Python with appropriate drivers. Indeed this is not easy as perhaps @Steven_Sagaert thinks when hosting websites on many commercial providers.

I’ve worked on large commercial sites, often Java based, but even there would stil not like to propose a Julia/JDBC solution to my now youngers and betters (paid at least). Julia Computing are righlty touting a financial package JuliaFin, however the lack of native connection to Oracle seems to me very bizarre.

Also I’m concerned that the NoSQL support in Julia for databases such as Mongo, Redis, Neo4J as almost entirely dried-up. One could say that there is always REST support, does that argument sound familiar?

So I applaud Adrian’s posting and I am quite willing to assist in any tetsing etc., perhaps those interested and I can get together at JuliaCon for a beer - as a Londoner I at least know of a few good pubs in the area.

1 Like

For comparison, I looked into the timeline for database support in golang and in python.

Here is a thread about the woeful state of database support in golang. Including an unfavorable comparison to … JDBC. (although the suggestion was not to wrap the java interface) This thread was started two years after the release of Go1 (by Google).

database/sql: Support for database, catalog, schema and table level metadata · Issue #7408 · golang/go · GitHub

The situation for python was much worse for a few years after the release of 1.0.0. But the http protocol was only two years old at the time of this release. The world was much smaller. So, its not the best comparison.

EDIT: The following post from the linked thread is especially relevant. It opens:

Julia had the promise of becoming the next multi-platform tool of choice with, in my opinion, so many advantages over Python and R…

Nah… just kidding. This is the actual opening sentence:

Golang had the promise of becoming the next multi-platform tool of choice with, in my opinion, so many advantages over Java

(my emphasis). The implication is that Golang will not live up to its promise.

8 Likes

Currently this is very much being overshadowed by the massive upgrade effort to Julia 0.7. I suspect that pushing this too hard before things have settled down a bit would be counterproductive.

As I’ve said, there shouldn’t be too much code involved in actually implementing this, the difficulty is in getting all the package maintainers on the same page. In principle I don’t even think that this would necessarily involve a separate package, though that may make it more difficult to get people to design uniform interfaces.

As for me my concrete plans are to create a common interface for JDBC.jl and LibPQ.jl (essentially PEP249) as this is something I will definitely need, but again this is overshadowed by the 0.7 transition for me, which has still yet to occur for the most critical code I use in my job. JDBC.jl and LibPQ.jl already have very similar interfaces, so again, this won’t involve very much actual code.

2 Likes

Rather old post, but I created a 501(3)c in about 3 months. There is a formation agreement; filing with the IRS for an EIN, and maybe filing for a state business license. You likely need an accountant because even as a non-profit you need to file 990’s to confirm you follow the rules to remain a non-profit.

Bureaucratic to be sure, but not that long or expensive. It’s mostly cookie-cutter. Actually running it–now that’s hard: getting funding and doing work.

Sorry for reviving this topic

But I am very new Julia, and working with databases is essential for me
Have the situation changed recently?

Is there any effort going into this, any projects on github?

The situation could definitely be better but whether that’s an issue for you or not depends on what you need to do exactly.

I’m not aware of any coordinated effort but there is the JuliaDatabases GitHub org with several packages of varying quality as well as some C-library wrappers elsewhere such as LibPQ and Mongoc.

From personal experience, LibPQ and JDBC have worked well for me while ODBC was full of bugs the last I tried it.

But note that JavaCall and hence JDBC doesn’t work on Julia versions greater than 1.0.x currently (but do work on v1.0.x).

3 Likes

As the creator of this thread and maintainer of the SearchLight ORM (which supports SQLite [via SQLite.jl], MySQL [via MySQL.jl], and Postgres [via LibPQ.jl]) I can confirm that accessing these databases definitely works. They are all functional and active open source projects. In general, since these adapters provide low-level query APIs, you can perform a lot of actions. More advanced features not related to querying will most likely be unsupported (like Postgres pub-sub, support for JSON types, etc).

That’s assuming you need relational databases support - which is the topic of the thread. If you’re looking for NoSQL or key-value you need to look in other places. Last time I checked (some 6 months ago), the Mongo library was still not functional. There are adapters for key-value stores like Redis and Memcache, but I haven’t used these recently so I can’t comment.

It entirely depends on your use case. What volume of operations are you looking at? From my experience with the 3, low volume operations are OK, but once you take things up a few notches, problems start showing. This is a reflection of the fact that it still is young beta software which hasn’t seen many high volume production deployments. My most extreme use case was importing some tens of millions of rows from CSV files. This caused SQLite to segfault every few tens of thousands of operations. MySQL had (and still has) a problem with a finalizer error which outputs to screen - which caused massive output to the terminal screen, leading to the whole app to slow down to a halt after some tens of thousands of queries.

Things have improved considerably since I raised the issue. For a data science kind of project where you need to perform standard ETL operations, you’ll most likely be fine. For something like a high volume, low-latency web application backend… well, I encourage you to try it cause we need production tested libraries, but expect that you (and your team) will most likely have to contribute and improve things.

10 Likes