Warning: rant below.
Yeah, but performance is the #1 reason most of us came to Julia in the first place, and it has a major impact on how we structure our code and types. If someone’s code took 1s in Julia 1.0, but 100s in 1.1, then it would get filed as a performance regression
, and it would be promptly fixed in a patch release AFAICT. Do you really have that much compiler freedom? Have you used it? When has there been a major performance regression on a commonly used idiom?
I’ve seen Jeff’s threat of inferring a lot of stuff as Any
, to reduce compiler latency. As someone managing rather complex, performance sensitive code with missings, units, and parametric types, I worry. If Julia 1.2 is suddenly 10X worse because of some failed inference, it might lock us on Julia 1.1 for a while, until I get the courage to dive into the code and satisfy the compiler’s new, unstated demands.
Furthermore, the “compiler freedom” idea justifies/explains the lack of information. There’s no user-facing docs for constant propagation (that I know of) because “it’s an implementation detail subject to change at any time”. So if I want to write high-performance, generic code, the only solution is to hang out on discourse, github and slack to soak in the occasional bubbles of wisdom that percolate from the dev team.
I once had a function that was
- type-stable on vectors of floats and missings,
- stable on vectors of unitful quantities
- unstable on vectors of unitful quantities and missings
IIRC it required a significant code refactoring to solve the issue. How was I to know in advance?
I’m just ranting here. I don’t have a solution. An optimizing compiler is an amazingly complex juggling act, and Julia is one of the best systems I’ve ever used. I get why the dev team is enthusiastic about compiler freedom. I can see the upside. However, as a performance-seeking user, it is frequently frustrating.
A lot of the pain comes from even identifying where the bad performance comes from. Tooling (like Traceur?) helps. @code_warntype
was great in 0.6, but it’s a lot more cryptic in 1.0, and I don’t know if I can trust its output (constants aren’t propagated, are they?). Also: @type_stable
blocks.
EDIT: testing for perf. regression is also messy, but it just occurred to me that if there was a @count_dynamic_calls foo(...)
macro, it could be a perfectly deterministic way of quantifying performance for testing. And it sounds very doable. Dynamic calls are already slowish, increasing a counter should not be an issue.