Status of Julia database libraries

This is not correct in general. First, what @rsrock said, but then these rates are also not used for every grant. They are typically applied for federal gov grants in the US. Most universities have negotiated these rates (I believe with NIH) at some point, and then they apply to all federal grants. But Sloan for example has a rule that no institution can slap more than 15% overhead on any grant they give, and then universities just accept that.

2 Likes

That’s true but doesn’t really invalidate the main point I was making—that NumFocus’ 10% overhead is very low. There are also universities that simply will not do grants from organizations like Sloan that won’t go above 15%.

I’d like to take this point, since a bunch of this delay is on my shoulders. The first point I’d like to make is that there has been actually very little major feature work in the past six months. A lot of work has gone in, but the vast majority of it has been cleanup work to get as much of that in for 1.0 as possible. The feature freeze accomplished what it was meant to, which was to refocus people from pie-in-the sky features onto release preparation and cleanup for the 1.0 release. I should also point out that a feature freeze is not a freeze on breaking changes. That freeze is much later, around the time we branch for release. Breaking changes do become more and more discouraged as the release goes on, but the feature freeze is the start of that process not the end of it.

When the feature freeze came around, there was one item that we explicitly still wanted to get in, which was the iteration protocol change. We had delayed working on this feature, because it relies heavily on the improved support for small unions, which took a while to develop. The iteration protocol changes were available by early January (two weeks or so after the feature freeze - I spent a good chunk of my winter holiday working on it), but we realized a problem: It blew up the compiler.

What happened was that while small unions now had good support in type inference, and while unions of bitstypes (e.g. Union{Float64, Nothing}) had very good support in codegen, this fell over as soon as you tried something slightly more complicated (e.g. put the union in a tuple, or used a non-bitstype element in the union), which an enormous (100x+) performance cliff. Now, we knew how to fix this problem (it’s a fairly standard compiler optimization problem), but we lacked the compiler infrastructure to implement it. We tried various things to hack around this (e.g. #25679 - Jameson had an attempt as well, though I can’t find it right now), but were ultimately unsuccessful.

At that point we could have kicked out the iteration protocol and started the 1.0 release process. In retrospect, given how long it took to get here, that may have been the right move. However, I was in general quite concerned about this performance cliff. The missing support and related compiler optimizations are one of the headline features of 1.0, and this performance cliff would have quickly led to disappointment about the performance of this functionality.

Because of this, we decided to go back and put the infrastructure in place to do this well. That took about 4 months, which is probably longer than it should have - and I apologize for that. However, an enormous amount of cleanup has gone in during that time, and I’m actually not convinced that I would want to go back and release julia as it was in January, even if I could. The 1.0 release will be much, much better because of all of the cleanup that has gone in and I think that will be well reflected in the experience of new users when they first try 1.0.

Now, there have been a couple extra features that have gone in as they became ready while we were waiting for the compiler to catch up, but I think that’s fine, and I don’t think they had a significant impact on delaying the release.

At the end of the day when to call these release milestones is a judgement call, and for this release cycle it’s been particularly hard. However, I think one thing we’ve learned while doing julia releases over the years is that the milestones are absolutely critical as a signaling factor, much more so that their strict adherence. The feature freeze was a signal to the developers working on julia to stop working on new features (or risk them not making the cut). The alpha is a signal to the package ecosystem to start paying attention to what has happened in julia base, and the betas and RCs are signals to start fixing packages.

Delaying the feature freeze to a point where we were absolutely sure that there were no more features we wanted would have been a mistake, because focus would have been significantly distracted from cleanup and polish. I myself am surprised by how big of a psychological effect these arbitrary deadlines/milestone have, but the effect is quite noticeable. Yes, doing things this way can lead to disappointment when things take longer afterwards, but having them is nevertheless super important.

That’s not to mean that things should take this long of course (and I apologize for my part in the delay), but at least from where we stand today, we’ve made it through this hump. We’re well on our way to 1.0. The performance cliff I mentioned above is gone. The 0.7 beta is coming out today. Pkg3 has matured enough to be useable for the majority of workflows. A number of packages have started to adopt BinaryBuilder, which should lead to a much more pleasant experience when installing packages with binary dependencies. The package ecosystem is starting to upgrade. It certainly hasn’t been easy but we’re getting there.

77 Likes

Databases could be better for sure, but I’m more concerned about the amount of negativity in the community where @StefanKarpinski and @Keno feel the need to write essays apologizing for something that all of us are getting for free.

In the 5 years I’ve been using Julia, I’ve become such a better programmer having learned about the concept and implications of multiple dispatch that if Julia never came out, I’d still be better off having learned about it at all.

But yeah, it’s easy to complain about the work other people are(nt) doing.

46 Likes

A further amplification on what Randy said above. I am very impressed by the work that has gone into making v0.7.0 ready for release. Certainly I would have liked to have seen it come out before now but, having worked over 50 years in this industry, I understand and appreciate how much effort is required to get a major release right.

I came upon Julia two years ago when I decided that there was no alternative to rewriting some major log parsers written in Perl used to analyze results of large scale performance load tests. In fact the Julia rewrite resulted in a factor of 5 improvement in analysis time. I am eagerly waiting to see how much more this can be improved between what Julia v0.7/1.0 brings to the table and my much greater understanding of how to write better performing Julia code.

I’m also one of those wishing for a more robust database handling infrastructure - many of our load tests add hundreds of millions of rows to our test data warehouse (SQL of course) for a single test. However I feel it is even more important to get the underlying language basics right even at the cost of the development pain we are currently experiencing.

My profound thanks go to the core developers and to all those who are contributing to a very vibrant package infrastructure. Overall this is a remarkable achievement that will only get better in the near term.

11 Likes

I don’t really feel that this is negativity that you are seeing.
We are all passionate fans of Julia.
We want to be able to use Julia all the time, at work, for fun side projects, instead of sleeping…

Neither Keno nor Stefan should feel the need to apologize, I think we all greatly appreciate their work (as I say probably toooo frequently :grin:), but I am happy that they both took the time to explain some things to those of us outside the inner circle.

We are trying to raise the awareness of how critical database connectivity is, and look for solutions.

I don’t feel it’s the responsibility of JC to “solve” this - they have quite enough on their plate as it is
(although direction, kind words of encouragement, and any other help they could provide [for example, maybe they already know of some companies that would be able to fund some database connectivity development, that JC simply hasn’t had the time / manpower to do at the moment], would be greatly appreciated)

I don’t know if @quinnj’s day job would allow him to also accept funding for some database work (heaven knows he deserves it!), and I’ll ask where I’m consulting now (they might like the idea, since they also need good database connectivity in Julia). Maybe some of us with database expertise could band together, get some funding and route it via NumFocus [thanks for that suggestion, @StefanKarpinski], and get something done sooner rather than later.

8 Likes

Plotting isn’t good enough, databases aren’t good enough, R does better, there isn’t enough support for ML.

There’s a lot of this nonsense going on. Doesn’t matter how passionate you are, if you’re not part of the solution you are part of the problem. I’ve never worked anywhere where the person who can identify a problem is the hero; only the person who fixes it. Everything else in my opinion is endless negativity.

9 Likes

OK. That’s not at all what I’ve been seeing.

I’m used to customers talking to me about their use cases, and their pain points with using our software, and trying to figure out with them whether it was something that we actually could handle already (but maybe could document better how to achieve that - these weren’t random people, they were paying $$$ to come to the annual conference - they’d already RTFM :grin:), or whether it was something that we could add something to our software to help them achieve their goals.
I found it was always useful to learn how the developers were using our software, it didn’t matter that they could not fix it themselves.

I especially liked going back the next year, if I was able to eliminate or alleviate their pain points, and show off the changes to them!

This I really disagree with, and to me, that’s the sort of negativity that can push away people who otherwise might later on be able to contribute.
I really wonder what might have happened when I think of the people who ended up being huge contributors to Julia after being helped out with their concerns on Gitter, if they had encountered that sort of attitude.

9 Likes

Explaining expectations (as for example @essenciary or @NaOH did) could help for example to strengthen group around https://github.com/JuliaDatabases/DBAPI.jl/issues/17 .

If not core but somebody else have to do work in this area it is good to see that people need it and will use it. (nobody wants to make useless packages).

I see your post as overreaction. As something which brings negativity here.

If you are talking about what we getting for free you need to count also people who wants to teach children Julia. Because more young coders means better language! :wink:

And I understand that these people who want to teach are preparing their teaching plans and had some expectation (based on words from Julia core!) when 1.0 will be out. Being late one year could be unpleasant for them. I think it is good to respect both work here! Not only core is creating good ecosystem.


I was thinking if something like “reference” implementation of DBAPI could help to push thing forward. If DBAPI is borrowed from python there could be probably possibility to get plenty of DB api-s “for free”!

If we would have designed this api (*) we probably could create some abstract pycall layer usable for any python’s DBAPI compatible solution.

It could be slow but probably useful for some scenarios.

(*) - @ExpandingMan what is the progress?If there are some new problems how could we help? (Last time it looks like few hours of work for you:wink: )

I don’t think people write free software primarily for the joy of seeing other people use it. Sure, that’s part of the payoff, but primarily free software is written because its authors it find it useful, and making it open source is the best way to collaborate and bring further in contributors.

Which is why I don’t really see the point of “I am not satisfied with part X of the language/package ecosystem” posts. They ultimately boil down to cajoling, exhorting, or ostensibly but nonsensically threatening (“if I don’t get X, I will not use your language”, sometimes worded more politely) other people to do work for you.

People here take this amiably and respond politely, but these discussions still accomplish little.

8 Likes

A few points:

  1. “if I don’t get X, I will not use your language”

Based on the stories above, the correct statement is “If we don’t get X, it is technically impossible for us to use your language for our project.”

  1. i) I am a scientist; ii) I don’t used DBs; iii) Therefore, scientists do not use DBs.

This is a crude caricature of @Tamas_Papp 's argument, but you get the point quickly. I just now googled GIS SQL. Sure enough, databases are very important for geographic information systems.

  1. Rather than (self-professed) rants, it’s probably more useful to write a dispassionate account of capabilities and limitations of Julia libraries. It’s certainly useful to have a good idea of whether you can complete a project in Julia before you start. I guess @essenciary is frustrated because the issue has already been raised a few times… So, I don’t know… maybe a web page listing priority, under-staffed/-funded feature development, including real stories of what cannot be done, would help in allocating resources. (My search found nothing existing, but I would not be surprised if someone has a link.) This would no doubt be more useful than scattered Discourse posts saying “I really need X”, followed by a bunch of posts saying either “Me too” or “Do it yourself”.

  2. I tried to find a detailed history of Python, but was unable. I wonder how long it took Python to have a solid database interface ? My guess is at least a few years. (Remember that Perl and Python were around for a few years before HTTP) My recollection, as a bystander, is that it took many years for Python to get financial support from institutions… Of course, the standard for launching a language today is different than it was in 1989. Still, Julia’s relative level of financial support today is closer to early years of Python than it is to, say, Go.

EDIT: But, with JuliaCon and v1.0 so close, the reality is none of this is going to happen before v1.0 is out.

2 Likes

But there is no such “technical” impossibility, as one can always implement DB packages in Julia. It is just easier if someone else does that.

You misrepresent my position, which is the following:

  1. FOSS libraries for DBs will probably be implemented by people who use them,

  2. the best way to make this happen is contributing.

4 Likes

Well, for many people, it really is an impossibility, since they simply don’t have the technical skills in that area (even though they may be super smart and experts in their field), or, even if they did, they don’t have the time to both write a whole DB package for Julia and get their real project built on time.

When people say that, I don’t see that as some sort of threat (which would be silly), I see that as somebody who really, really, really would like to be able to use Julia (for all the reasons that we love Julia), but is very :sob: that they can’t (at this point).

Again, most people who use them may not have the skills to implement them.
People who do have the skills (and background knowledge) to do so, if they have to do so for their job, may not be able to make the resultant package open source (that happened to me, for most all of the Julia code that I wrote over the last 3 years).

Hopefully, this can be handled another way: either via a grant or some crowdfunding scheme, to pay for a small group of Julians to work on this (I’d love to do nothing more than spend 100% of my time improving the Julia ecosystem, however, I have to earn enough money to pay our mortgages, feed my family, and save some for retirement + college for my two boys).

5 Likes

Or, possibly, patience. My impression is that people are working on these issues actively, and they will be resolved at some point.

4 Likes

We need a #gripes channel in discourse :slightly_smiling_face:

3 Likes

While these threads are often somewhat painful to read (I believe it’s hard to devote time to a project X for free and primarily get as feedback a “Julia will fail because X is not good enough”), they are also encouraging. A few years ago (when starting up Julia without loading any package took an eternity, plotting was only available via pyplot, there was no Juno or VS Code IDE) nobody was expecting good native Julia packages for most things, it was more of a DIY language. Now Julia is starting to offer native packages to solve most problems, in some cases equivalent than the competition, in other cases better and in other cases worse. As a consequence, it is getting a different type of users that expect a more polished user experience where everything they need is already there.

While it’s certainly positive that this type of user base starts using Julia eventually, I don’t think it is crucial to convince them “urgently” as they are unlikely to contribute to the language and ecosystem. Still, it is encouraging that even for users who only care about using pre-existing libraries, Julia has become a plausible option: this I think is a major success.

9 Likes

On the topic of funds, I would like to draw attention again to this.

I had requested that all proposals be submitted to me by June 22 (please read the whole thread), but I am happy to extend until June 25, if there is interest in proposing work on DB libraries as part of this. Also do note, that NumFocus runs this on an ongoing basis, so this is not the only time to submit.

We could also seriously consider having a GSoC project or two on this topic next year.

I think it is a good start to voice these concerns, and gathering others who care about such functionality. Money is not the hardest part, in my opinion. What happens after the work on a set of high quality db access libraries is funded?

Someone still has to maintain them - and the community that owns and maintains those packages will need to be motivated by their user communities. This is a good start, but we probably need more people to file issues, design considerations, and perhaps even consolidate packages. These are the questions anyone funding will ask - because their biggest question often is whether work they fund now will continue to be maintained, or whether it needs ongoing funds. Of course, reaching 1.0 helps in a big way.

-viral

25 Likes

Maybe I’m a “glass half-full” kind of guy, but to me, if I’m working on X, and get the feedback that X is so important that people feel that Julia might fail without a better X, that just vindicates to me that the time I’m spending on X is justified and that X will be useful when finished.

For example, lately we’ve been pointing out a lot of issues with Pkg3, but that’s NOT at all because 1) we don’t appreciate the massive work that Stefan and Kristoffer have been doing or 2) we think they are doing it all wrong (I’m pretty happy with what I’ve seen), it’s because we really really want it to succeed.

3 Likes

Irrespective of the merits/dermerits of the OP’s ( @essenciary , Adrian) post. He has contributed significantly to the Julia ecosystem.

6 Likes

It’s great to hear everyone chiming in here and seeing all the need/want for database support in Julia! I’ll give a couple thoughts on current/future development of database support in Julia:

  • I actually don’t think database support is too bad in Julia; as others have mentioned, between JDBC.jl and ODBC.jl, one can at least connect and do basic things w/ most databases and at least between MySQL.jl, SQLite.jl, ODBC.jl, and LibPQ.jl, they all sport roughly similar interfaces not unlike a DBAPI set of methods.

  • Now obviously there issues as well: outstanding github issues, performance opportunities, broader support across platforms. But IMO, there’s nothing a week or two of diving in and fixing things can’t solve. I’ll try over the next week or two to do just that.

  • But in terms of broader API design, I know there’s been a little recent interest in reviving something like DBAPI.jl, but as the 3-5 times this has happened in the past, nothing has really manifested beyond the initial rally of “hey, maybe we need something like this!”

  • For me, I’ve never personally seen a ton of value in having something like DBAPI.j; sure in the simplest cases, you might be able to swap out one db w/ another. But realistically, how often do you really need to be swapping databases within an application? Sure there’s value in knowing how to “guess” how to connect to a new database, but IMO, it’s worth 15-30 minutes of reading up on a new database if you’re thinking about using (or have to) in an application. There’s a reason there are so many databases, no two are exactly the same and there are important differences in usage and APIs to be aware of when using. There’s certainly value in trying to keep basic operations as similar as possible (executing a query, fetching results, etc); but there will always be additional db-specific operations/functionality to be aware of.

  • On that note, I think there’s certainly some cleanup due that would help unify some of the APIs across the existing packages. One of the advantages of the current API of SQLite.jl, MySQL.jl, and ODBC.jl is that through implementing the DataStreams interface, tables of all kinds (other databases, dataframes, file formats, etc.) can be input or output automatically.

  • Call to action: I’ve created the #databases slack channel to help facilitate ongoing discussion on issues and APIs. Heavy users, contributors (potential and otherwise), and anyone else are welcome to join and help. In particular, it would be great to channel all this passion towards organizing efforts to help fix issues, increase platform coverage, and unify APIs. If you’re willing to help test a database on a platform or two, please chime in.

Cheers everyone!

39 Likes