Status of Julia database libraries

Databases could be better for sure, but I’m more concerned about the amount of negativity in the community where @StefanKarpinski and @Keno feel the need to write essays apologizing for something that all of us are getting for free.

In the 5 years I’ve been using Julia, I’ve become such a better programmer having learned about the concept and implications of multiple dispatch that if Julia never came out, I’d still be better off having learned about it at all.

But yeah, it’s easy to complain about the work other people are(nt) doing.

46 Likes

A further amplification on what Randy said above. I am very impressed by the work that has gone into making v0.7.0 ready for release. Certainly I would have liked to have seen it come out before now but, having worked over 50 years in this industry, I understand and appreciate how much effort is required to get a major release right.

I came upon Julia two years ago when I decided that there was no alternative to rewriting some major log parsers written in Perl used to analyze results of large scale performance load tests. In fact the Julia rewrite resulted in a factor of 5 improvement in analysis time. I am eagerly waiting to see how much more this can be improved between what Julia v0.7/1.0 brings to the table and my much greater understanding of how to write better performing Julia code.

I’m also one of those wishing for a more robust database handling infrastructure - many of our load tests add hundreds of millions of rows to our test data warehouse (SQL of course) for a single test. However I feel it is even more important to get the underlying language basics right even at the cost of the development pain we are currently experiencing.

My profound thanks go to the core developers and to all those who are contributing to a very vibrant package infrastructure. Overall this is a remarkable achievement that will only get better in the near term.

11 Likes

I don’t really feel that this is negativity that you are seeing.
We are all passionate fans of Julia.
We want to be able to use Julia all the time, at work, for fun side projects, instead of sleeping…

Neither Keno nor Stefan should feel the need to apologize, I think we all greatly appreciate their work (as I say probably toooo frequently :grin:), but I am happy that they both took the time to explain some things to those of us outside the inner circle.

We are trying to raise the awareness of how critical database connectivity is, and look for solutions.

I don’t feel it’s the responsibility of JC to “solve” this - they have quite enough on their plate as it is
(although direction, kind words of encouragement, and any other help they could provide [for example, maybe they already know of some companies that would be able to fund some database connectivity development, that JC simply hasn’t had the time / manpower to do at the moment], would be greatly appreciated)

I don’t know if @quinnj’s day job would allow him to also accept funding for some database work (heaven knows he deserves it!), and I’ll ask where I’m consulting now (they might like the idea, since they also need good database connectivity in Julia). Maybe some of us with database expertise could band together, get some funding and route it via NumFocus [thanks for that suggestion, @StefanKarpinski], and get something done sooner rather than later.

8 Likes

Plotting isn’t good enough, databases aren’t good enough, R does better, there isn’t enough support for ML.

There’s a lot of this nonsense going on. Doesn’t matter how passionate you are, if you’re not part of the solution you are part of the problem. I’ve never worked anywhere where the person who can identify a problem is the hero; only the person who fixes it. Everything else in my opinion is endless negativity.

9 Likes

OK. That’s not at all what I’ve been seeing.

I’m used to customers talking to me about their use cases, and their pain points with using our software, and trying to figure out with them whether it was something that we actually could handle already (but maybe could document better how to achieve that - these weren’t random people, they were paying $$$ to come to the annual conference - they’d already RTFM :grin:), or whether it was something that we could add something to our software to help them achieve their goals.
I found it was always useful to learn how the developers were using our software, it didn’t matter that they could not fix it themselves.

I especially liked going back the next year, if I was able to eliminate or alleviate their pain points, and show off the changes to them!

This I really disagree with, and to me, that’s the sort of negativity that can push away people who otherwise might later on be able to contribute.
I really wonder what might have happened when I think of the people who ended up being huge contributors to Julia after being helped out with their concerns on Gitter, if they had encountered that sort of attitude.

9 Likes

Explaining expectations (as for example @essenciary or @NaOH did) could help for example to strengthen group around https://github.com/JuliaDatabases/DBAPI.jl/issues/17 .

If not core but somebody else have to do work in this area it is good to see that people need it and will use it. (nobody wants to make useless packages).

I see your post as overreaction. As something which brings negativity here.

If you are talking about what we getting for free you need to count also people who wants to teach children Julia. Because more young coders means better language! :wink:

And I understand that these people who want to teach are preparing their teaching plans and had some expectation (based on words from Julia core!) when 1.0 will be out. Being late one year could be unpleasant for them. I think it is good to respect both work here! Not only core is creating good ecosystem.


I was thinking if something like “reference” implementation of DBAPI could help to push thing forward. If DBAPI is borrowed from python there could be probably possibility to get plenty of DB api-s “for free”!

If we would have designed this api (*) we probably could create some abstract pycall layer usable for any python’s DBAPI compatible solution.

It could be slow but probably useful for some scenarios.

(*) - @ExpandingMan what is the progress?If there are some new problems how could we help? (Last time it looks like few hours of work for you:wink: )

I don’t think people write free software primarily for the joy of seeing other people use it. Sure, that’s part of the payoff, but primarily free software is written because its authors it find it useful, and making it open source is the best way to collaborate and bring further in contributors.

Which is why I don’t really see the point of “I am not satisfied with part X of the language/package ecosystem” posts. They ultimately boil down to cajoling, exhorting, or ostensibly but nonsensically threatening (“if I don’t get X, I will not use your language”, sometimes worded more politely) other people to do work for you.

People here take this amiably and respond politely, but these discussions still accomplish little.

8 Likes

A few points:

  1. “if I don’t get X, I will not use your language”

Based on the stories above, the correct statement is “If we don’t get X, it is technically impossible for us to use your language for our project.”

  1. i) I am a scientist; ii) I don’t used DBs; iii) Therefore, scientists do not use DBs.

This is a crude caricature of @Tamas_Papp 's argument, but you get the point quickly. I just now googled GIS SQL. Sure enough, databases are very important for geographic information systems.

  1. Rather than (self-professed) rants, it’s probably more useful to write a dispassionate account of capabilities and limitations of Julia libraries. It’s certainly useful to have a good idea of whether you can complete a project in Julia before you start. I guess @essenciary is frustrated because the issue has already been raised a few times… So, I don’t know… maybe a web page listing priority, under-staffed/-funded feature development, including real stories of what cannot be done, would help in allocating resources. (My search found nothing existing, but I would not be surprised if someone has a link.) This would no doubt be more useful than scattered Discourse posts saying “I really need X”, followed by a bunch of posts saying either “Me too” or “Do it yourself”.

  2. I tried to find a detailed history of Python, but was unable. I wonder how long it took Python to have a solid database interface ? My guess is at least a few years. (Remember that Perl and Python were around for a few years before HTTP) My recollection, as a bystander, is that it took many years for Python to get financial support from institutions… Of course, the standard for launching a language today is different than it was in 1989. Still, Julia’s relative level of financial support today is closer to early years of Python than it is to, say, Go.

EDIT: But, with JuliaCon and v1.0 so close, the reality is none of this is going to happen before v1.0 is out.

2 Likes

But there is no such “technical” impossibility, as one can always implement DB packages in Julia. It is just easier if someone else does that.

You misrepresent my position, which is the following:

  1. FOSS libraries for DBs will probably be implemented by people who use them,

  2. the best way to make this happen is contributing.

4 Likes

Well, for many people, it really is an impossibility, since they simply don’t have the technical skills in that area (even though they may be super smart and experts in their field), or, even if they did, they don’t have the time to both write a whole DB package for Julia and get their real project built on time.

When people say that, I don’t see that as some sort of threat (which would be silly), I see that as somebody who really, really, really would like to be able to use Julia (for all the reasons that we love Julia), but is very :sob: that they can’t (at this point).

Again, most people who use them may not have the skills to implement them.
People who do have the skills (and background knowledge) to do so, if they have to do so for their job, may not be able to make the resultant package open source (that happened to me, for most all of the Julia code that I wrote over the last 3 years).

Hopefully, this can be handled another way: either via a grant or some crowdfunding scheme, to pay for a small group of Julians to work on this (I’d love to do nothing more than spend 100% of my time improving the Julia ecosystem, however, I have to earn enough money to pay our mortgages, feed my family, and save some for retirement + college for my two boys).

5 Likes

Or, possibly, patience. My impression is that people are working on these issues actively, and they will be resolved at some point.

4 Likes

We need a #gripes channel in discourse :slightly_smiling_face:

3 Likes

While these threads are often somewhat painful to read (I believe it’s hard to devote time to a project X for free and primarily get as feedback a “Julia will fail because X is not good enough”), they are also encouraging. A few years ago (when starting up Julia without loading any package took an eternity, plotting was only available via pyplot, there was no Juno or VS Code IDE) nobody was expecting good native Julia packages for most things, it was more of a DIY language. Now Julia is starting to offer native packages to solve most problems, in some cases equivalent than the competition, in other cases better and in other cases worse. As a consequence, it is getting a different type of users that expect a more polished user experience where everything they need is already there.

While it’s certainly positive that this type of user base starts using Julia eventually, I don’t think it is crucial to convince them “urgently” as they are unlikely to contribute to the language and ecosystem. Still, it is encouraging that even for users who only care about using pre-existing libraries, Julia has become a plausible option: this I think is a major success.

9 Likes

On the topic of funds, I would like to draw attention again to this.

I had requested that all proposals be submitted to me by June 22 (please read the whole thread), but I am happy to extend until June 25, if there is interest in proposing work on DB libraries as part of this. Also do note, that NumFocus runs this on an ongoing basis, so this is not the only time to submit.

We could also seriously consider having a GSoC project or two on this topic next year.

I think it is a good start to voice these concerns, and gathering others who care about such functionality. Money is not the hardest part, in my opinion. What happens after the work on a set of high quality db access libraries is funded?

Someone still has to maintain them - and the community that owns and maintains those packages will need to be motivated by their user communities. This is a good start, but we probably need more people to file issues, design considerations, and perhaps even consolidate packages. These are the questions anyone funding will ask - because their biggest question often is whether work they fund now will continue to be maintained, or whether it needs ongoing funds. Of course, reaching 1.0 helps in a big way.

-viral

25 Likes

Maybe I’m a “glass half-full” kind of guy, but to me, if I’m working on X, and get the feedback that X is so important that people feel that Julia might fail without a better X, that just vindicates to me that the time I’m spending on X is justified and that X will be useful when finished.

For example, lately we’ve been pointing out a lot of issues with Pkg3, but that’s NOT at all because 1) we don’t appreciate the massive work that Stefan and Kristoffer have been doing or 2) we think they are doing it all wrong (I’m pretty happy with what I’ve seen), it’s because we really really want it to succeed.

3 Likes

Irrespective of the merits/dermerits of the OP’s ( @essenciary , Adrian) post. He has contributed significantly to the Julia ecosystem.

6 Likes

It’s great to hear everyone chiming in here and seeing all the need/want for database support in Julia! I’ll give a couple thoughts on current/future development of database support in Julia:

  • I actually don’t think database support is too bad in Julia; as others have mentioned, between JDBC.jl and ODBC.jl, one can at least connect and do basic things w/ most databases and at least between MySQL.jl, SQLite.jl, ODBC.jl, and LibPQ.jl, they all sport roughly similar interfaces not unlike a DBAPI set of methods.

  • Now obviously there issues as well: outstanding github issues, performance opportunities, broader support across platforms. But IMO, there’s nothing a week or two of diving in and fixing things can’t solve. I’ll try over the next week or two to do just that.

  • But in terms of broader API design, I know there’s been a little recent interest in reviving something like DBAPI.jl, but as the 3-5 times this has happened in the past, nothing has really manifested beyond the initial rally of “hey, maybe we need something like this!”

  • For me, I’ve never personally seen a ton of value in having something like DBAPI.j; sure in the simplest cases, you might be able to swap out one db w/ another. But realistically, how often do you really need to be swapping databases within an application? Sure there’s value in knowing how to “guess” how to connect to a new database, but IMO, it’s worth 15-30 minutes of reading up on a new database if you’re thinking about using (or have to) in an application. There’s a reason there are so many databases, no two are exactly the same and there are important differences in usage and APIs to be aware of when using. There’s certainly value in trying to keep basic operations as similar as possible (executing a query, fetching results, etc); but there will always be additional db-specific operations/functionality to be aware of.

  • On that note, I think there’s certainly some cleanup due that would help unify some of the APIs across the existing packages. One of the advantages of the current API of SQLite.jl, MySQL.jl, and ODBC.jl is that through implementing the DataStreams interface, tables of all kinds (other databases, dataframes, file formats, etc.) can be input or output automatically.

  • Call to action: I’ve created the #databases slack channel to help facilitate ongoing discussion on issues and APIs. Heavy users, contributors (potential and otherwise), and anyone else are welcome to join and help. In particular, it would be great to channel all this passion towards organizing efforts to help fix issues, increase platform coverage, and unify APIs. If you’re willing to help test a database on a platform or two, please chime in.

Cheers everyone!

39 Likes

Thank you all for the amazing feedback - I’ve been reading carefully and I think a lot of useful information has surfaced.

1 - there actually are ways in which such “infrastructure” projects can be funded. @viralbshah NumFocus sounds great and I’m happy to look into it. The June 25th timeframe is unrealistic as far as I’m concerned (especially as I’m clueless in regards to this kind of work) but I’m confident that I can prepare something for the next application period.

2 - I won’t delve into @randyzwitch reply – everybody is entitled to their own opinion and he expressed his in a non-offending and polite manner and I respect that. However, as far as I’m concerned (and I think I speak for many other contributors):
a. I did put a couple of thousand hours of my personal time into open source Julia development. This is not directly related to core Julia or scientific computing but nonetheless, it can have a meaningful positive impact on the language’s and community’s development. If I’m not interested or competent to work on low level DB libraries, even if I need them, it doesn’t mean that I can’t help in other ways: using the software, reporting issues, raising awareness through posts like this and developing other useful software.
b. as contributors, we do have certain expectations that the language will support features which allow us to focus on the areas of development that interest us. This is a truism: if it wasn’t so, we wouldn’t use an existing language, but rather each of us would develop their own language from scratch.

3 - I’m not the only one which is directly affected by the state of the DB libraries. Before this the default approach was “DB access is irrelevant for my work, good bye”. I’m happy to hear that this is not the case and I believe we made a clear case that us, as a community, need this.

4 - @quinnj
a. thank you so much for putting this higher on your list of priorities and undertaking this massive effort!
b.I wouldn’t worry about DBAPI. First, we need to have basic CRUD functionality which is stable and performant. I don’t see a lot of value in DBAPI either, simply because I prefer higher level ORMs.
c. on the same topic, instead of seeing precious time going towards things like DBAPI, after stable CRUD features I believe we’d get more value from DBMS specific support (like for example Postgres pub-sub or support for Postgres JSON types).
d. I have discarded ODBC and JDBC because frankly, it’s not standard for web development. They are cumbersome, limited and not as performant as the native libraries. Also, having to install a Java stack just to access the DB, would make a sysadmin hate my guts. As a CTO or architect, I could never justify such a thing to my team. However, I will test them next, in my personal project, in the same conditions – I’m curious to see if they work. I will report back on this.
e. great to hear about the new channel, thank you. It’s of utmost importance to leverage the power of open source contributions. There’s nothing more discouraging than reporting issues and not hearing back for weeks (or ever).
f. I will allocate some time to help track down the performance bottleneck in MySQL.jl / Julia 0.7. I don’t know if I’ll be able to fix it, but at least I’ll pinpoint the source of the problem.

Thanks again to everybody that took the time to express their opinions, concerns and to reply and inform us!

25 Likes

Julia ORM would be great!

I don’t know which rock you’ve been hiding under for the last 20 years but JDBC (and I guess ODBC also in the .NET/MS world) is very standard in web dev that’s based on the JVM, and since Java is the number one language overall and very dominant in server side dev (especially anything “enterprise”) that’s a lot of web dev! Also almost all of the big data server frameworks are Java or Scala based and hence their db access is also based on JDBC. Note that all the JVM ORM frameworks also use JDBC underneath (these do have potential performance problems if used naïvely but that’s due to querying not JDBC).

I’ve been doing java dev for 20+ years and never really had any performance issues due to JDBC. If your query is slow then it’s usually the query/indexing on the DB not the performance of JDBC. Sure you might squeeze out a little more performance for mass loading/dump via a C API but is it worthwhile? Not for regular web dev.

The great advantage of JDBC is that it’s simple and platform independent. You can use the same driver on any platform for the same DB and it will just work. No complex setup or config: just add the driver to your classpath, set the username & password and your set to go. That’s worth gold.

I think that using JDBC from julia is to access DBs is a good and valid option and the overhead of instantiating a JVM isn’t that big a deal. Especially since the other options are that stable at the moment. With JDBC all the hard lifting is already done by the JDBC driver you just need some simple .javaCall calls from julia to make it work.