I’ve been looking up compiler latency in Julia after reading the blog post on decreasing method invalidation in version 1.6 [Analyzing sources of compiler latency in Julia: method invalidations]. I think I sort of got the gist of the progress on “time to first plot” so far, but there’s a couple things I still don’t really get because of my own inexperience:
How long is compilation time in Julia compared to other compiled languages? Does method invalidation and recompilation play a role in other languages or is it particular to Julia? I haven’t actually used a compiled language before Julia, so I was hoping someone who has used one can give a fair comparison.
After reading about the reduction of method invalidations that happen when combining common packages, am I right in thinking that developers can only reduce method invalidation for combinations of packages they are aware of? Or are there general principles to prevent method invalidation for arbitrary combinations?
The answer to 1 is very complicated for a number of reasons: The TLDR is that Julia is has faster compilation than most statically compiled languages, but that work doesn’t get saved between runs, so it’s more critical that it is fast than in a static language where you only ever compile once.
Yes: fix (1) inference problems and (2) any cases of type piracy. Together those account for the vast majority of vulnerabilities, and once fixed they protect you against almost anything other than piracy by other packages.
There may be inference problems you can’t practically fix, e.g., tasks like parsing are often inherently non-inferrable. That’s one reason why many packages still trigger a small number of invalidations in Base and the standard libraries. When that happens, your focus generally needs to shift to “damage control,” meaning that you do what you can to prevent inference failures from propagating through long call chains. For example, several methods in Base now have annotations like x + Int(y)::Int, which might look crazy, but if you really have no idea what type y is, then you don’t know which method Int(y) will call, and if there are enough options (currently, more than 3) Julia won’t go to the effort to check whether all such methods return an Int. Hence Int(y)::Int both performs the conversion and insists that the return value is an Int (Julia will throw an easy-to-understand error if not), and this truncates the “damage” of not being able to know what kind of object y actually is (it prevents additional sources of vulnerability in code that uses the result of this computation). Again, this kind of change requires a bit more expertise to make, and should only be done in cases where there’s a good reason to believe that this code must sometimes be called under conditions where complete inference cannot be guaranteed. When you can, it’s much better to fix the type-instability in whatever called this method.
But my analyses of both Julia and several packages suggest that there are a lot of readily fixable inference problems out there waiting to be discovered. It’s probably more an issue for code that has to handle diverse data than for low-level numerical algorithms.
Good to hear that I can use the classic tips of avoiding type instability and type piracy.
Follow-up question: should I also avoid extending methods from other people’s packages on my own types if I want to prevent method invalidations? I know that doesn’t count as type piracy, but it’s still adding a method to a function. I’m not certain at all, but I’m wondering if even that would cause invalidations.
From my limited knowledge, yes this may lead to invalidation. Previous methods from the package that called the function you extended may need to be recompiled (because, for example, they had parameters of type Any, did not specialize on them, and passed such parameters to the function you extended, so now your type can be passed to them and then, correctly, to the method you extended, before loading your code this was not a possibility and the code may have been optimized to not consider it). However, it is exactly this mechanism that allow many beautiful designs in Julia, so the question is to not abuse it, or make use of it in an unintended fashion, not to spurn it.
Don’t worry about that case until/unless it becomes a problem. And if it becomes a problem, the fix isn’t to change your package, it’s to change the upstream package. So really don’t worry about it.
I don’t think anyone has addressed this question yet, but it’s my favorite one. To my knowledge, Julia is the only language with this “problem” — and I’d love to be corrected otherwise. I put “problem” in scare quotes because it’s really a feature. It’s a feature that simultaneously allows for some extraordinary optimizations (like inlining user functions into standard libraries) all while preserving completely dynamic semantics! It’s a really amazing capability and it’s one of those core pieces of technology that makes Julia uniquely fast.
This is one of the key Julia features I try to explain to Fortran/C++ modelling colleagues… Your random script will be inlined into other packages and base! Everything automatically recompiles just to suit your needs! But I probably scare them off with how exciting that is to me.
As Matt & Henrique say, as long as the methods are non-pirating you should feel free to extend other functions—it’s a key part of Julia’s magic. If it does cause invalidations, it’s because of an inference problem somewhere. Fixing the inference problem will eliminate the invalidation.
Adding an even more obvious advantage of invalidation: it’s the key trait underlying our dynamic and incrementally-refined development process. Invalidations were introduced as a necessary component to the fix of the infamous #265, and it’s what makes a package like Revise possible. Many languages require that the entire “world” be available before starting to compile your code, and they don’t take kindly to sudden updates in the state of code midway through the process. In contrast, Julia allows you to evolve your code and, in most cases, arrive at the same outcome no matter how you got there within the confines of a single session.