Status of Julia database libraries

But there is no such “technical” impossibility, as one can always implement DB packages in Julia. It is just easier if someone else does that.

You misrepresent my position, which is the following:

  1. FOSS libraries for DBs will probably be implemented by people who use them,

  2. the best way to make this happen is contributing.

4 Likes

Well, for many people, it really is an impossibility, since they simply don’t have the technical skills in that area (even though they may be super smart and experts in their field), or, even if they did, they don’t have the time to both write a whole DB package for Julia and get their real project built on time.

When people say that, I don’t see that as some sort of threat (which would be silly), I see that as somebody who really, really, really would like to be able to use Julia (for all the reasons that we love Julia), but is very :sob: that they can’t (at this point).

Again, most people who use them may not have the skills to implement them.
People who do have the skills (and background knowledge) to do so, if they have to do so for their job, may not be able to make the resultant package open source (that happened to me, for most all of the Julia code that I wrote over the last 3 years).

Hopefully, this can be handled another way: either via a grant or some crowdfunding scheme, to pay for a small group of Julians to work on this (I’d love to do nothing more than spend 100% of my time improving the Julia ecosystem, however, I have to earn enough money to pay our mortgages, feed my family, and save some for retirement + college for my two boys).

5 Likes

Or, possibly, patience. My impression is that people are working on these issues actively, and they will be resolved at some point.

4 Likes

We need a #gripes channel in discourse :slightly_smiling_face:

3 Likes

While these threads are often somewhat painful to read (I believe it’s hard to devote time to a project X for free and primarily get as feedback a “Julia will fail because X is not good enough”), they are also encouraging. A few years ago (when starting up Julia without loading any package took an eternity, plotting was only available via pyplot, there was no Juno or VS Code IDE) nobody was expecting good native Julia packages for most things, it was more of a DIY language. Now Julia is starting to offer native packages to solve most problems, in some cases equivalent than the competition, in other cases better and in other cases worse. As a consequence, it is getting a different type of users that expect a more polished user experience where everything they need is already there.

While it’s certainly positive that this type of user base starts using Julia eventually, I don’t think it is crucial to convince them “urgently” as they are unlikely to contribute to the language and ecosystem. Still, it is encouraging that even for users who only care about using pre-existing libraries, Julia has become a plausible option: this I think is a major success.

9 Likes

On the topic of funds, I would like to draw attention again to this.

I had requested that all proposals be submitted to me by June 22 (please read the whole thread), but I am happy to extend until June 25, if there is interest in proposing work on DB libraries as part of this. Also do note, that NumFocus runs this on an ongoing basis, so this is not the only time to submit.

We could also seriously consider having a GSoC project or two on this topic next year.

I think it is a good start to voice these concerns, and gathering others who care about such functionality. Money is not the hardest part, in my opinion. What happens after the work on a set of high quality db access libraries is funded?

Someone still has to maintain them - and the community that owns and maintains those packages will need to be motivated by their user communities. This is a good start, but we probably need more people to file issues, design considerations, and perhaps even consolidate packages. These are the questions anyone funding will ask - because their biggest question often is whether work they fund now will continue to be maintained, or whether it needs ongoing funds. Of course, reaching 1.0 helps in a big way.

-viral

25 Likes

Maybe I’m a “glass half-full” kind of guy, but to me, if I’m working on X, and get the feedback that X is so important that people feel that Julia might fail without a better X, that just vindicates to me that the time I’m spending on X is justified and that X will be useful when finished.

For example, lately we’ve been pointing out a lot of issues with Pkg3, but that’s NOT at all because 1) we don’t appreciate the massive work that Stefan and Kristoffer have been doing or 2) we think they are doing it all wrong (I’m pretty happy with what I’ve seen), it’s because we really really want it to succeed.

3 Likes

Irrespective of the merits/dermerits of the OP’s ( @essenciary , Adrian) post. He has contributed significantly to the Julia ecosystem.

6 Likes

It’s great to hear everyone chiming in here and seeing all the need/want for database support in Julia! I’ll give a couple thoughts on current/future development of database support in Julia:

  • I actually don’t think database support is too bad in Julia; as others have mentioned, between JDBC.jl and ODBC.jl, one can at least connect and do basic things w/ most databases and at least between MySQL.jl, SQLite.jl, ODBC.jl, and LibPQ.jl, they all sport roughly similar interfaces not unlike a DBAPI set of methods.

  • Now obviously there issues as well: outstanding github issues, performance opportunities, broader support across platforms. But IMO, there’s nothing a week or two of diving in and fixing things can’t solve. I’ll try over the next week or two to do just that.

  • But in terms of broader API design, I know there’s been a little recent interest in reviving something like DBAPI.jl, but as the 3-5 times this has happened in the past, nothing has really manifested beyond the initial rally of “hey, maybe we need something like this!”

  • For me, I’ve never personally seen a ton of value in having something like DBAPI.j; sure in the simplest cases, you might be able to swap out one db w/ another. But realistically, how often do you really need to be swapping databases within an application? Sure there’s value in knowing how to “guess” how to connect to a new database, but IMO, it’s worth 15-30 minutes of reading up on a new database if you’re thinking about using (or have to) in an application. There’s a reason there are so many databases, no two are exactly the same and there are important differences in usage and APIs to be aware of when using. There’s certainly value in trying to keep basic operations as similar as possible (executing a query, fetching results, etc); but there will always be additional db-specific operations/functionality to be aware of.

  • On that note, I think there’s certainly some cleanup due that would help unify some of the APIs across the existing packages. One of the advantages of the current API of SQLite.jl, MySQL.jl, and ODBC.jl is that through implementing the DataStreams interface, tables of all kinds (other databases, dataframes, file formats, etc.) can be input or output automatically.

  • Call to action: I’ve created the #databases slack channel to help facilitate ongoing discussion on issues and APIs. Heavy users, contributors (potential and otherwise), and anyone else are welcome to join and help. In particular, it would be great to channel all this passion towards organizing efforts to help fix issues, increase platform coverage, and unify APIs. If you’re willing to help test a database on a platform or two, please chime in.

Cheers everyone!

39 Likes

Thank you all for the amazing feedback - I’ve been reading carefully and I think a lot of useful information has surfaced.

1 - there actually are ways in which such “infrastructure” projects can be funded. @viralbshah NumFocus sounds great and I’m happy to look into it. The June 25th timeframe is unrealistic as far as I’m concerned (especially as I’m clueless in regards to this kind of work) but I’m confident that I can prepare something for the next application period.

2 - I won’t delve into @randyzwitch reply – everybody is entitled to their own opinion and he expressed his in a non-offending and polite manner and I respect that. However, as far as I’m concerned (and I think I speak for many other contributors):
a. I did put a couple of thousand hours of my personal time into open source Julia development. This is not directly related to core Julia or scientific computing but nonetheless, it can have a meaningful positive impact on the language’s and community’s development. If I’m not interested or competent to work on low level DB libraries, even if I need them, it doesn’t mean that I can’t help in other ways: using the software, reporting issues, raising awareness through posts like this and developing other useful software.
b. as contributors, we do have certain expectations that the language will support features which allow us to focus on the areas of development that interest us. This is a truism: if it wasn’t so, we wouldn’t use an existing language, but rather each of us would develop their own language from scratch.

3 - I’m not the only one which is directly affected by the state of the DB libraries. Before this the default approach was “DB access is irrelevant for my work, good bye”. I’m happy to hear that this is not the case and I believe we made a clear case that us, as a community, need this.

4 - @quinnj
a. thank you so much for putting this higher on your list of priorities and undertaking this massive effort!
b.I wouldn’t worry about DBAPI. First, we need to have basic CRUD functionality which is stable and performant. I don’t see a lot of value in DBAPI either, simply because I prefer higher level ORMs.
c. on the same topic, instead of seeing precious time going towards things like DBAPI, after stable CRUD features I believe we’d get more value from DBMS specific support (like for example Postgres pub-sub or support for Postgres JSON types).
d. I have discarded ODBC and JDBC because frankly, it’s not standard for web development. They are cumbersome, limited and not as performant as the native libraries. Also, having to install a Java stack just to access the DB, would make a sysadmin hate my guts. As a CTO or architect, I could never justify such a thing to my team. However, I will test them next, in my personal project, in the same conditions – I’m curious to see if they work. I will report back on this.
e. great to hear about the new channel, thank you. It’s of utmost importance to leverage the power of open source contributions. There’s nothing more discouraging than reporting issues and not hearing back for weeks (or ever).
f. I will allocate some time to help track down the performance bottleneck in MySQL.jl / Julia 0.7. I don’t know if I’ll be able to fix it, but at least I’ll pinpoint the source of the problem.

Thanks again to everybody that took the time to express their opinions, concerns and to reply and inform us!

25 Likes

Julia ORM would be great!

I don’t know which rock you’ve been hiding under for the last 20 years but JDBC (and I guess ODBC also in the .NET/MS world) is very standard in web dev that’s based on the JVM, and since Java is the number one language overall and very dominant in server side dev (especially anything “enterprise”) that’s a lot of web dev! Also almost all of the big data server frameworks are Java or Scala based and hence their db access is also based on JDBC. Note that all the JVM ORM frameworks also use JDBC underneath (these do have potential performance problems if used naïvely but that’s due to querying not JDBC).

I’ve been doing java dev for 20+ years and never really had any performance issues due to JDBC. If your query is slow then it’s usually the query/indexing on the DB not the performance of JDBC. Sure you might squeeze out a little more performance for mass loading/dump via a C API but is it worthwhile? Not for regular web dev.

The great advantage of JDBC is that it’s simple and platform independent. You can use the same driver on any platform for the same DB and it will just work. No complex setup or config: just add the driver to your classpath, set the username & password and your set to go. That’s worth gold.

I think that using JDBC from julia is to access DBs is a good and valid option and the overhead of instantiating a JVM isn’t that big a deal. Especially since the other options are that stable at the moment. With JDBC all the hard lifting is already done by the JDBC driver you just need some simple .javaCall calls from julia to make it work.

1 - “I don’t know which rock you’ve been hiding under for the last 20 years” – do you think that starting with this will help you or the point you’re trying to make? Is this necessary or helpful?

2 - I agree that the JVM is a great piece of software and that Java is used a lot, especially in enterprise web development. But this is a Julia discussion forum and Julia does not run on top of JVM. The same applies to all the languages I’ve been using in the last 20 years: PHP, Ruby, JavaScript, Elixir, Lua and a few others. Adding a JVM to a non Java dev stack, just to access a DB over JDBC, is a useless complication and it’s not done. It adds development, admin, security, learning, recruitment, etc overhead to any non-Java team and stack.

3 - this is a Julia forum and I couldn’t care less about your Java proselytism. If you’re happy with having Java as a dependency of your Julia stack, go wild. That’s not the point of this thread.

15 Likes

Well on the one hand you complain that there aren’t any good solutions for db access from julia but you also discard widely used standard ones you can build on like JDBC or ODBC… Even core julia uses third party libs like BLAS etc.

Would it be nice to have a solution where you do not have to go through Java? Sure, but in the absence of such stable battle test solutions I’d go with a pragmatic solution rather than a dogmatic one. Just my 2 cents…

May be you should be a little less arrogant in your answers

2015 IEEE Ranking

ieee%20ranking%202015

2017 IEEE Ranking

ieee%20ranking%202017

If java is losing places there is certainly some good reasons.
If you want to look professional, it’s better to keep you informed than ranting over your colleague

From my point of view, having Java as a dependency of a Julia web app is not a good solution.

9 Likes

I have not changed my belief that writing the actual code for a common database API should be pretty quick. The difficulty in the project is not in writing the code, it’s in collaborating with the community on something that we all agree on and are happy with.

I have indeed not forgotten about this, and have been looking into it. As I’ve said elsewhere, the major obstacle to me seems to be that any database API should naturally include DataStreams somehow, but the DataStreams interface is already implemented in (I think all of) the currently active database packages. It is very good that DataStreams is this widely adopted, but it makes coming up with a common interface a little awkward right now, as it looks like there would be a substantial intersection between any new common API and currently existing interfaces.

I have not forgotten that I said I would work on this, and I will, but from where I sit things are already a bit in chaos right now because of the 0.7 transition, so I’m not in a really big hurry to get this done until things calm down a bit (and I don’t think we should bother to support 0.6).

I will be getting in touch with @quinnj (who also wrote the entire DataStreams package himself) and @iamed2 on Slack and I’m sure we’ll get some sort of common database interface going at some point.

2 Likes

Regardless, this detour does not add any value to the conversation.

We have already moved on to actually doing things in Julia. The #database Slack channel is a great first step, we have a list of conclusions, we have a few clear next steps and I’m looking forward to working on this and getting it right.

5 Likes