One of my advisors is considering transitioning towards Julia from a traditional HPC background using C++ and Python. After a couple days playing around, he seems quite impressed by the flexibility of Julia’s type system. But, he hasn’t done any “serious numerical work” (his words), and he is wondering what problems I have encountered using Julia for this “serious work”. I have my list, but I’m interested to hear what other people think. For background, we do plasma Particle-in-Cell simulations.
Here’s what I came up with:
Time to first plot. Can be solved by working in a REPL, or using PackageCoompiler. But still a worse workflow than say python.
Smaller ecosystem, but most important tools have been implemented, and more every day.
Getting a good workflow on clusters can be tricky. Need to get appropriate versions installed, install packages, precompile serially, then either start Julia with a machine file, or use MPI. Of course, all these steps are true for other languages, but someone has often done some of the hard work for you already.
This is much less an issue in the most recent Julia versions (1.5 and 1.6). Plotting from scratch (new Julia session) takes ~7s for me in 1.6 beta - not really problematic in my opinion.
In addition I suggest to use Revise.jl in your development workflow for automatic reloading.
If a Julia library is not available there is always the option to use Python libraries with PyCall.jl or call C libraries.
Right - you could work around this by defining a “normal” conda environment (with fixed environment.yaml file) and start your Julia session in a terminal where it is activated. But this is not very convenient.
I have never shied away from using terminals on my life, and even so, I was won over by Jupyter. Anything that is a little takes more than a couple of lines (i.e., anything that is not just testing manually a function that I am changing in my own package and Revise.jl is keeping the REPL up to date with the package code), like data exploration, I use Jupyter. It solves the tedious part of the process of hitting arrow up or searching for already executed command to re-run or edit.
I think the main issue I have is encountering bug, redesigns, or performance problems with libraries. Julia’s package ecosystem seems pretty nice at first but when you start to use things in a “more serious” manner you often stubble on some issues, and many packages are under-staffed (if at all) so you often have to contribute or fix things yourself. Julia is nice enough that it can be done but it still takes time and energy that is not spent on your work.
This is very true. Specially if make more advanced use than most of the package users. I love JuMP, but the solver wrapper packages made some questionable (in my option) designs that impacted performance in less used features. When I started using these features, I had to contribute with patches that greatly improved the performance of these less used features. I also not the biggest fan of DataStructures.jl, I know they are doing their best but it kinda pales in comparison to what is available in other languages.
I think I can honestly say that there isn’t a single thing that I have encountered that I would consider a barrier to “serious work”. Sure, certain things seem slow to pre-compile or run for the first time, but to me, waiting a few seconds for one plot to show up one time is a non-issue. I can see how it might seem like a huge deal for people coming straight from Python, but after a year of using Julia exclusively I don’t really notice it anymore. Some of it is just understanding that the trade-off is worth it, and a large part of that is all the amazing work being done by the community to speed things up.
The reasons I *DO like Julia for serious work:
Julia Base gets better with each release.
Nearly everything I need is somewhere in the registry.
The community is helpful and responsive.
It is fast.
Multiple dispatch is a better paradigm than standard OOP (IMHO).
It is possible to write functions that look like equations for more readable code.
I’ll probably think of more after I hit the “submit” button, but that’s a start.
I miss the possibility of using the @code_warntype macro in other scope than the global one. The only moment I felt that something was taking more time with Julia than with Fortran was once I had a type instability that was hard to trace, and I had to run the pieces of code one by one until I reached the state of the system where the instability was actually manifesting itself in a function call. Since this is specific to Julia (relative to Fortran, at least), I put it in Julia’s account.
Otherwise everything else I find much better than previous workflows I had, including how easy is to distribute packages. And the final performance of my packages turns out to have improved relative to Fortran, mostly because it is easier and more fun to work on the code.
I understand where you come from, but from an outsider’s point of view 7s to plot really isn’t acceptable in 2021, especially for a language in which you have to restart the interpreter every now and then when you change definitions. Python takes much less; Matlab’s startup time is comparable but typically it’s something you only do once at the beginning of your work session.
7s is less than the time I need to write a simple line of code. And you do not need to restart very often thanks to Revise.
This may be slightly annoying for the user, but I doubt that it will yield to any significant loss of productivity. Other factors like how easy / fast you can code in a language and run-times (for computationally intensive tasks) are much more important.
I’ve only been exploring Julia as an alternative to the same Python plus C++ combination for a few months now and I like a lot of what it offers. However, the one thing I’m starting to dislike is that because Julia is a dynamically-typed language you don’t get the type-checking benefit that a compiler for a statically-typed language can provide you, especially when you’re doing code refactoring. It’s just so much easier with a language like C++ and the use of incremental compilation to quickly weed out incorrect use of types or fields, typos and other mistakes. With any non-trivial Julia program similar errors will only show up during execution and usually not in the first two seconds of running. Sure, I can write unit tests to make certain all code paths are covered and working correctly, but even executing such a test suite after every change will take ages. The compile step that happens with every Julia execution just makes iteration during development extremely slow, even with -O0 --compile=min. Somehow it drags my brain with it into sleep mode, due to the sluggishness of the whole workflow. Perhaps Revise can help here, haven’t looked into it in detail.
I guess this is also a matter of personal coding style. Normally I like to edit code in C++ and do a quick compile to see everything’s still correct as far as the compiler is concerned, do another change, compile, etc and only do a longer run of the code (say of a few minutes) when I’m relatively certain it will do something sensible.
Hello Adam! Just so you’re aware, there is an “HPC Community Call” that we have about once a month, where we talk about some of these pain points and how we’re going to improve them, specifically in the context of HPC and clusters. If you’re interested in joining, I suggest joining the #hpc channel on the JuliaLang slack and we’ll get you hooked up with a calendar invite.
As for the points you mentioned:
should be greatly improved with 1.6; I would expect to see a few seconds shaved off of whatever timing you were getting in 1.5.
As everyone here has attested, this will come with time, and is both a blessing and a curse, for those that have strong opinions about how software should be written.
This is something we’re actively working on in the #hpc channel, but MPI and such is difficult to get working while maintaining the portability and compatibility guarantees that we like to have in the Julia world. I’m confident we’ll get there eventually, but it might take a little while. If you’re interested in this kind of stuff, I highly encourage you to get involved!
I agree with @paulmelis. I like complied languages. Once my program fully compiles it has a shot of at working. With Julia I have to ensure that I test each and every path to make sure I didn’t mistype a function or provided incorrect arguments or reference a variable that doesn’t exist. Sometimes getting to those paths (if I don’t write test cases for EVERYTHING) is tricky and time consuming.
Interesting, I also did that with Fortran, but I don’t remember it with any pleasure, on the contrary. When a minimal “working” chunck of code compiled was a moment of relief followed by the sentiment of “well, I have finally something to start searching for the bugs”. With Julia I can test much smaller sets of code. I don’t see the compilation solving the problem of the code running on every path, except for the most simple errors.
(Everyone should look carefully to Revise! It changes completely the development workflow, and, even with the struct definition limitation, with advantages relative to complied languages, I think)
Funny how you have the exact opposite feeling than me when a piece of code successfully compiles For me it tells me the code is correct at least on the language syntax and semantics level, which is the stepping-stone for getting it to do what it should do (i.e. correctness). Without that check I have to deal with two levels of uncertainty and possible errors (1. syntax+semantics, 2. correctness).
I’d be interested to know more on your approach for this. I haven’t found a good way to do unit testing of a Julia package yet, for doing fine-grained testing of part of a package without loading up the whole package (as the latter causes longer compile times when you might want to run only a specific test).
It seems the idiomatic file organization for a package is to have one file named after the package which contains a module ... end with include()s in the module scope for most of the code. With Python a separate module (file) always declares its dependencies, making it easy to test one module/file separately. But with a Julia file that’s included as part of a package how should I approach the same granularity in testing? I’d like to avoid loading (and thus have compile) the full package in order to test only a small subset, as that would be a disadvantage in terms of quick development iterations as I mentioned above.
Running unit tests in a statically-typed language like C++ is mostly for having runtime coverage of the code, as the static compilation done before you can even run the code will take care of catching all issues related to static typing. For the latter, unit tests (i.e. running code) cannot provide any benefit anyway, as the code either compiles and can be run or doesn’t compile (e.g. type error somewhere) and so cannot run. Issues relating to dynamic typing, like not checking the result of a dynamic_cast<>, can, of course, be caught with unit tests.
Yeah, I really should start to experiment with it.