Status of Julia database libraries

Thank you all for the amazing feedback - I’ve been reading carefully and I think a lot of useful information has surfaced.

1 - there actually are ways in which such “infrastructure” projects can be funded. @viralbshah NumFocus sounds great and I’m happy to look into it. The June 25th timeframe is unrealistic as far as I’m concerned (especially as I’m clueless in regards to this kind of work) but I’m confident that I can prepare something for the next application period.

2 - I won’t delve into @randyzwitch reply – everybody is entitled to their own opinion and he expressed his in a non-offending and polite manner and I respect that. However, as far as I’m concerned (and I think I speak for many other contributors):
a. I did put a couple of thousand hours of my personal time into open source Julia development. This is not directly related to core Julia or scientific computing but nonetheless, it can have a meaningful positive impact on the language’s and community’s development. If I’m not interested or competent to work on low level DB libraries, even if I need them, it doesn’t mean that I can’t help in other ways: using the software, reporting issues, raising awareness through posts like this and developing other useful software.
b. as contributors, we do have certain expectations that the language will support features which allow us to focus on the areas of development that interest us. This is a truism: if it wasn’t so, we wouldn’t use an existing language, but rather each of us would develop their own language from scratch.

3 - I’m not the only one which is directly affected by the state of the DB libraries. Before this the default approach was “DB access is irrelevant for my work, good bye”. I’m happy to hear that this is not the case and I believe we made a clear case that us, as a community, need this.

4 - @quinnj
a. thank you so much for putting this higher on your list of priorities and undertaking this massive effort!
b.I wouldn’t worry about DBAPI. First, we need to have basic CRUD functionality which is stable and performant. I don’t see a lot of value in DBAPI either, simply because I prefer higher level ORMs.
c. on the same topic, instead of seeing precious time going towards things like DBAPI, after stable CRUD features I believe we’d get more value from DBMS specific support (like for example Postgres pub-sub or support for Postgres JSON types).
d. I have discarded ODBC and JDBC because frankly, it’s not standard for web development. They are cumbersome, limited and not as performant as the native libraries. Also, having to install a Java stack just to access the DB, would make a sysadmin hate my guts. As a CTO or architect, I could never justify such a thing to my team. However, I will test them next, in my personal project, in the same conditions – I’m curious to see if they work. I will report back on this.
e. great to hear about the new channel, thank you. It’s of utmost importance to leverage the power of open source contributions. There’s nothing more discouraging than reporting issues and not hearing back for weeks (or ever).
f. I will allocate some time to help track down the performance bottleneck in MySQL.jl / Julia 0.7. I don’t know if I’ll be able to fix it, but at least I’ll pinpoint the source of the problem.

Thanks again to everybody that took the time to express their opinions, concerns and to reply and inform us!

25 Likes

Julia ORM would be great!

I don’t know which rock you’ve been hiding under for the last 20 years but JDBC (and I guess ODBC also in the .NET/MS world) is very standard in web dev that’s based on the JVM, and since Java is the number one language overall and very dominant in server side dev (especially anything “enterprise”) that’s a lot of web dev! Also almost all of the big data server frameworks are Java or Scala based and hence their db access is also based on JDBC. Note that all the JVM ORM frameworks also use JDBC underneath (these do have potential performance problems if used naïvely but that’s due to querying not JDBC).

I’ve been doing java dev for 20+ years and never really had any performance issues due to JDBC. If your query is slow then it’s usually the query/indexing on the DB not the performance of JDBC. Sure you might squeeze out a little more performance for mass loading/dump via a C API but is it worthwhile? Not for regular web dev.

The great advantage of JDBC is that it’s simple and platform independent. You can use the same driver on any platform for the same DB and it will just work. No complex setup or config: just add the driver to your classpath, set the username & password and your set to go. That’s worth gold.

I think that using JDBC from julia is to access DBs is a good and valid option and the overhead of instantiating a JVM isn’t that big a deal. Especially since the other options are that stable at the moment. With JDBC all the hard lifting is already done by the JDBC driver you just need some simple .javaCall calls from julia to make it work.

1 - “I don’t know which rock you’ve been hiding under for the last 20 years” – do you think that starting with this will help you or the point you’re trying to make? Is this necessary or helpful?

2 - I agree that the JVM is a great piece of software and that Java is used a lot, especially in enterprise web development. But this is a Julia discussion forum and Julia does not run on top of JVM. The same applies to all the languages I’ve been using in the last 20 years: PHP, Ruby, JavaScript, Elixir, Lua and a few others. Adding a JVM to a non Java dev stack, just to access a DB over JDBC, is a useless complication and it’s not done. It adds development, admin, security, learning, recruitment, etc overhead to any non-Java team and stack.

3 - this is a Julia forum and I couldn’t care less about your Java proselytism. If you’re happy with having Java as a dependency of your Julia stack, go wild. That’s not the point of this thread.

15 Likes

Well on the one hand you complain that there aren’t any good solutions for db access from julia but you also discard widely used standard ones you can build on like JDBC or ODBC… Even core julia uses third party libs like BLAS etc.

Would it be nice to have a solution where you do not have to go through Java? Sure, but in the absence of such stable battle test solutions I’d go with a pragmatic solution rather than a dogmatic one. Just my 2 cents…

May be you should be a little less arrogant in your answers

2015 IEEE Ranking

ieee%20ranking%202015

2017 IEEE Ranking

ieee%20ranking%202017

If java is losing places there is certainly some good reasons.
If you want to look professional, it’s better to keep you informed than ranting over your colleague

From my point of view, having Java as a dependency of a Julia web app is not a good solution.

9 Likes

I have not changed my belief that writing the actual code for a common database API should be pretty quick. The difficulty in the project is not in writing the code, it’s in collaborating with the community on something that we all agree on and are happy with.

I have indeed not forgotten about this, and have been looking into it. As I’ve said elsewhere, the major obstacle to me seems to be that any database API should naturally include DataStreams somehow, but the DataStreams interface is already implemented in (I think all of) the currently active database packages. It is very good that DataStreams is this widely adopted, but it makes coming up with a common interface a little awkward right now, as it looks like there would be a substantial intersection between any new common API and currently existing interfaces.

I have not forgotten that I said I would work on this, and I will, but from where I sit things are already a bit in chaos right now because of the 0.7 transition, so I’m not in a really big hurry to get this done until things calm down a bit (and I don’t think we should bother to support 0.6).

I will be getting in touch with @quinnj (who also wrote the entire DataStreams package himself) and @iamed2 on Slack and I’m sure we’ll get some sort of common database interface going at some point.

2 Likes

Regardless, this detour does not add any value to the conversation.

We have already moved on to actually doing things in Julia. The #database Slack channel is a great first step, we have a list of conclusions, we have a few clear next steps and I’m looking forward to working on this and getting it right.

5 Likes

Please keep the discussion polite and constructive. Otherwise we’ll have to lock it.

18 Likes

As I mentioned in some other thread, success story of big data in Java is more likely to be the result of historical conditions rather than the language or platform itself. In fact, JVM is quite suboptimal for this kind of tasks, e.g. see comparison of MapR-FS and HDFS:

The second architectural difference I want to talk about is the fact that MapR is written in native code and talks to directly to disk. We’ve created a lot of optimizations in writing to disk that help with performance and scalability. Contrast that with HDFS, which is written in Java. It’ll necessarily run in the JVM and then it’ll talk to a Linux file system before it talks to disks, so you have a few layers there that will impact performance and scalability.

It’s also not quite correct to say that JVM is ahead of other platforms. In some cases, for some languages and tasks - yes, but there’s also a number of shortcomings. For example, for many years I was fascinated by improvements to JVM’s garbage collectors - several architectures, optimized for different use cases, with so many clever solutions (G1 is my favorite)! That was until I realized that the best way to deal with garbage objects is not to create them: in performance-critical sections in Julia program I rarely allocate much, but instead I use pre-allocated and stack-allocated objects - something that is quite hard to do in Java. So the whole rocket-science-level GC becomes unnecessary.

Regarding using JDBC, having been somewhat involved into development of JavaCall.jl, I’d discourage you from talking to JVM when a native solution is available. JVM is terribly unfriendly to other languages - at the very least you should expect that all the data available in JVM will need to be copied to Julia memory before usage (PySpark and by inheritance Spark.jl actually use TCP connection to pass data between processes instead).

And more importantly, making Julia depend on JVM for such a basic thing as database access may discourage many people from switching to it, so at least we should have an alternative.

7 Likes

From my point of view, having Java as a dependency of a Julia web app is not a good solution.

I would generally agree with that sentiment too. But, if JDBC drivers are solid, and it gets us a bunch of stability and functionality right away - isn’t that a good starting point. I would agree that nothing is as good as high quality native access, but perhaps take an easier path to complete the user experience for the Julia webapp developer, and then surgically plug in high quality native drivers.

In much the same way that we really want a Julia BLAS that is hackable, but we use other BLAS libraries until we get there.

-viral

11 Likes

Roll on the Julia BLAS. Many times in the benchmarking discussions here the answer is “Well, your runtime is dominated by the system BLAS”. SO Julia really cannot do much about that.

Yes that is a weak point of the JVM (and similar runtimes) but you can actually do something about this:

  1. currently you can do “off heap” storage via ByteBuffer (or third party lib). however this is a low level mechanism. See e.g. On-heap vs off-heap storage on waitingforcode.com - articles about Off-heap or On Heap vs Off Heap Memory Usage - DZone or just google “java off-heap”. This is used for example in high frequency trading
  2. there might be a fundamental change to the JVM in the future called Value Types for this purpose. Basically adding C-like structs to the JVM. See for example The Current State of Java Value Types.

But of course that will still not beat C/C++ in terms of memory usage (although you’ll get pretty close) but there’s also the question of memory stability.

Agreed that a pure julia DB solution is preferable (I never said otherwise) but building on an existing big ecosystem (like the JVM) should also not be dismissed out of hand just because one does not like Java/JVM because “it’s not cool anymore” (that seems to be a theme amoung young developers nowadays).

Sure, as a necessary and temporary evil. I’m using an ORM I’ve built, so swapping backends is easy – and I wanted to add support for ODBC and JDBC for a long time, anyway :slight_smile: Hopefully I’ll be able to report that ODBC and JDBC work great and that they are viable alternatives until we get native support.

Considering that Java itself is only 22 years old (if you count from the v1.0 release, 23 if you count from the alpha in 1995, which is when I started with it - I also went to all of the first years of JavaOne), and also helped out with implementing the JDBC/ODBC server & client for Caché, I think that’s a great exaggeration (as well as not the sort of thing that is helpful to the community).

Also, I’ve seen many times where using JDBC simply was not performant (because of issues of going back and forth between JVM and a C callin/callout interface, not because JDBC, when used from a JVM based language, wasn’t performant). That’s precisely the situation that Julia is in.

1 Like

I’m familiar with JRuby for example, and I think it’s awesome. The JVM is a great piece of technology. The Spring framework is super cool. Etc.

If Julia would’ve run on the JVM, that would’ve been super cool too. It’s not about being cool - it’s about the complexity of adding Java to a non-Java stack. That’s why there’s the C/MRI Ruby and the JRuby, for example.

Can we drop the discussion about Java now, please? We’re here to make Julia a better and more complete language, not to debate Java’s strength and weaknesses. We (including you) have already agreed that JDBC is an option, but it’s far from ideal. Beyond that, it’s a guaranteed flamewar that will lead to the closure of the post.

10 Likes

I think we have a good group that has both extensive database and Julia experience, and we all really want to make this succeed.
Anybody else who is interested in contributing in any way (users are great for testing, documentation and helping prioritize features from their own use cases) should join us.

1 Like