Julia in production: examples and share your experience

I use Julia for my (data science) job and have a couple of Julia programs that are up “in production”.

Primarily my take on this is that, in 2019, everyone doing everything via “microservices” makes using Julia really not that much different than using anything else. Even the people at my company using Python are, for the most part, not directly making calls to other in-house Python libraries, but rather depending on IO. Therefore, using Julia isn’t really much different. This has been a god-send for me: where I work it’s nigh on impossible to get anyone to use Julia (though there are some cases, particularly where JuMP is involved, in which Julia is so comically superior to any available alternative that for the most part everyone has is forced to admit that Julia is the right choice). If it were not for the microservices world, my colleagues would make it almost impossible for me to use Julia (even though PyCall is excellent and JavaCall is working again in 1.3), but, as it is, nobody really cares. Just about every interaction a program does with the outside world is via HTTP or some SQL query, so as long as these things work, nobody can really object to my using Julia. (Also, probably nobody wants to hear me give a half hour speach about how terrible Python is if they try to get me to use it.)

In my experience so far, all of the above applies just as well to “production” as development. My Julia programs live in some docker image that runs somewhere. It fetches data, usually either through HTTP (thanks to AWS services it is almost always possible these days to get data through HTTP), or PostgreSQL. Testing is pretty much the same as it is with anything else; unit tests on the program itself can be run anywhere where the docker has access to input data, integration testing is free to treat the docker image as a black box, the same as if it were written in any other language. My one major concern down the line is my colleagues increasing use of Databricks, which looks to me like a proprietary horror show, however, I believe that even in the worst case scenarios I’ll always be able to get anything I need through a JDBC driver (which very unfortunately would add the JVM as a dependency to my programs, but it’s not the end of the world).

There is however, one huge black mark on the whole Julia usage narrative, particularly “in production”: difficulty of statically compiling binaries. This has been a thorn in my side since I’ve started using the language, and now that I have more stuff “in production” I feel it much more keenly. Particularly if I’m using a docker image, the most I can do before running my program is have the docker build script run using in Julia for all the critical packages. This at least generates the *.ji files, but very painfully, every single time the docker runs, everything has to get recompiled. This means there is an awful 30 to 180 s lag before anything in my docker image can run, even if it’s a simple utility. The only alternative right now is PackageCompiler, but this tends to be fiddly and difficult, so it’s only really worth it if you absolutely must run with low latency or if your code doesn’t change for a really long time (unfortunately in my case everything is under constant development).

So, in summary, the only special difficulty I find with using Julia in production is the difficulty of compiling static binaries. For the rest, it’s much like using anything else.

16 Likes