TLDR;
My experience with the DB access libraries has been overwhelmingly negative - I’m concerned that the Julia ecosystem is missing a huge foundational piece for success. Things don’t look any better as we’re moving towards v1.0, the respective repos being almost deserted in terms of activity. These libraries are too important to be left at the mercy of open source contributors and we need a solution fast. A v1.0 language with v0.1 DB access libraries is of no use for a large array of projects.
Rant
With the risk of sounding like a broken record – it was only a year ago when I was raising the alarm about Julia losing support for PostgreSQL as its only library, at that point, became abandonware – I’m afraid I have to say it again: I consistently run into critical issues with Julia’s DB libraries. The packages are undeveloped, insufficiently tested and poorly supported.
For my perspective (my focus being web development and general computing with Julia) this is got to be one of the worse possible problems.
To put things in context, I’m using Julia and Genie to develop a decently ambitious web app: a reasonably complex web app for handling hotel bookings and reservations. It’s really nothing crazy but it’s not a toy/theoretical project either: we’re talking about some 300K+ hotels in a DB table plus related data.
I’m also the developer of SearchLight, a Julia ORM - so I’ve had a lot of exposure in the last few years to the lower level DB libraries (MySQL, SQLite and PostgreSQL). Thanks to SearchLight I can easily swap DB backends - and as it’s customary in web development, I started with a SQLite DB in development. SQLite seemed to work, until I started importing the 300K hotels data from JSON files. After a few thousand DB operations during the import, Julia would segfault. Issue reported here, no response yet despite the severity of the problem: https://github.com/JuliaDatabases/SQLite.jl/issues/146
OK, that was bad. But luckily, I could switch to MySQL. That worked OK in v0.6 but it turns out unusable in v0.7. OK, sure, “it’s pre v1.0, what can you expect”, I know the story - but bear with me, I’ll address this. Here again, it usually works, but when used with production requirements, you end up with horrible problems. Like the impossibility to run a simple select query. Look at this: 45s, 200 M allocations and 12 GiB! For a query which takes 3.1ms in the MySQL client.
julia> @time MySQL.query(conn, "SELECT `hotels`.`id` AS `hotels_id`, `hotels`.`facilities` AS `hotels_facilities`, `hotels`.`themes` AS `hotels_themes`, `hotels`.`nr_rooms` AS `hotels_nr_rooms`, `hotels`.`country` AS `hotels_country`, `hotels`.`destination` AS `hotels_destination`, `hotels`.`nr_bars` AS `hotels_nr_bars`, `hotels`.`hotel_score` AS `hotels_hotel_score`, `hotels`.`nr_restaurants` AS `hotels_nr_restaurants`, `hotels`.`availability_score` AS `hotels_availability_score`, `hotels`.`checkin_to` AS `hotels_checkin_to`, `hotels`.`checkout_from` AS `hotels_checkout_from`, `hotels`.`address` AS `hotels_address`, `hotels`.`zipcode` AS `hotels_zipcode`, `hotels`.`currencycode` AS `hotels_currencycode`, `hotels`.`regions` AS `hotels_regions`, `hotels`.`checkin_from` AS `hotels_checkin_from`, `hotels`.`phone` AS `hotels_phone`, `hotels`.`descriptions` AS `hotels_descriptions`, `hotels`.`longitude` AS `hotels_longitude`, `hotels`.`name` AS `hotels_name`, `hotels`.`email` AS `hotels_email`, `hotels`.`latitude` AS `hotels_latitude` FROM `hotels` WHERE (`hotels`.`id` = 218) ORDER BY hotels.id ASC LIMIT 1")
44.534217 seconds (202.54 M allocations: 12.060 GiB, 0.75% gc time)
Here is the full issue: https://github.com/JuliaDatabases/MySQL.jl/issues/113
In v0.6 as well, there was a problem with a finalizer
error (and it still is in v0.7). Again, not an issue when “playing” with MySQL, but when used to perform tens of thousands of operations the output of the error would clog the REPL and slow it down till it had to be killed. Not ideal for web apps which are expected to run uninterruptedly for months at a time.
Why is this a very bad thing?
-
First of all, because it all seems to work. But in fact, these libraries have not been properly tested and these problems are only discovered late in the development process when it’s extremely costly to switch to a different tech stack (honestly, you’d be forced to drop Julia altogether because one just can’t deliver a product in these conditions).
-
Then, when things break, you’re pretty much on your own. You wanted to show off Julia – well tough luck, now you have to explain how come that basic features don’t work. Luckily, this is a project I’m doing with friends in our spare time, so no huge loss. Sadly, it’s impossible in these conditions to recommend Julia at work. It’s cool that we have DataFrames and Query and DataStreams but if these run on such a shaky foundation, they’ll crumble.
-
“Yes, but it’s Open Source and it’s pre 1.0 - it’s normal”. Yes, but these are infrastructure critical projects (I said it before and I will repeat it everytime I can). It’s right there with File IO. Look at Elixir, to take an example of a highly successful new language (to stay in the realm of web development, which I know well): the core team released the language, the web framework (Phoenix) and the ORM (Ecto). They are all under the core team’s umbrella.
And what’s worse, it doesn’t look like things are on the right track either – as we’re getting close to v1.
Look at the state of the contributions - it doesn’t look good for v0.7 / 1.0.
- MySQL: https://github.com/JuliaDatabases/MySQL.jl/graphs/contributors
- SQLite: https://github.com/JuliaDatabases/SQLite.jl/graphs/contributors
Don’t get me wrong, the contributors are Open Source heroes and I’m grateful for all their work. It’s not a statement about them. They’re doing amazing work but it is what it is - they need help. I suggested in the past that Julia Computing uses their organizational and funding clout and experience to properly manage these DB “infrastructure” libraries. I still think that’s the best approach, but maybe other people have different and better ideas.
If I’m wrong or too pessimistic, please correct me - I could really do with a more optimistic version of events. Because the way things are today, basically I can’t use Julia to build pretty much anything. And that is frustrating as hell, especially as things are worse than a year ago when I was raising the same problems but was talking about Postgres. Yes, Julia is an amazing language and there are amazing features and libraries which work very well, but for web development we need proper DB access libraries.