Guidelines on when to optimize code

question

#1

I poked around a bit but did not find guidelines on when a simple opportunity for optimization should be taken. I’m not concerned about end-user code, rather both the core and packages. Perhaps the main issue is code complexity / compilation time vs. efficiency. Does something like this exist ? Or maybe there is an unwritten consensus. It’s a practical question for people who make a PR, but don’t follow the Julia issues and PRs closely. An example: This is the current code for merge! (there are a couple of other methods that are irrelevant here).

function merge!(d::AbstractDict, others::AbstractDict...)
    for other in others
        for (k,v) in other
            d[k] = v
        end
    end
    return d
end

Say for the sake of argument that the following is faster for any collection of AbstractDicts. (In fact it is faster for several cases with not only two, but also three Dicts.)

function merge1!(d::AbstractDict, others::AbstractDict...)
    for other in others
        merge1!(d,other)
    end
    return d
end

function merge1!(d::AbstractDict, other::AbstractDict)
    for (k,v) in other
        d[k] = v
    end
    return d
end

I don’t want to focus too much on merge! here, its just an example. But, I am only interested in cases, like this one, where there is no observable difference. Here are a few choices

  1. Choose the shorter code if the difference in speed is at most 50%. (Or number of allocations, etc.)
  2. If the efficiency factor is constant (not “algorithmic”) and not causing complaints, choose the less complex code until v1.0 is released. The reason is that the implementation and even user-facing API may change. After v1.0, choose the more efficient implementation.
  3. Wait till v1.0, but ask more subtle questions. How often and where is the code called?. E.g. you can accept a lot of complexity to make sin efficient. Or you may want to wait until the user base grows in order to better inform larger-scale design choices. In other words, be conservative with premature optimization.
  4. For examples like the one above, choose the simpler code and wait for a compiler optimization. But, even if the optimization is implemented, how can you be sure it’s applied to your code? I am thinking of the argument for one of the advantages of syntactic loop fusion; you know that the optimization is made. I recall using the C preprocessor because the compiler’s heuristic for inlining was not good enough, and I didn’t want to waste time monitoring and tinkering with flags. In the case above, one could prototype the optimization with macros. Because we have syntactic macros, this can be done safely and you don’t even need to be a compiler-person.

It may be difficult to write a somewhat simple set of guidelines on this.


#2

I would go further. It’s impossible to come up with a simple set of guidelines.

No, this is not even close to universal. For some codes? Sure. For DiffEq’s stochastic differential equation integrators? That’s a tough one. That could be the difference between a 4 day vs 1 week to solve some large stochastic partial differential equations. People expect library code to be optimized, so if we can get 2x faster we should go for it in this case. But for my own code I’ll look at it and go “yeah no”, and throw it on a cluster to come back a week later. The nice thing about Julia is that it gives you the choice.

Scaling doesn’t matter if you have a large constant. Example: GPU computing. That only makes sense in some cases because of its large constant factor.

Agreed here. This is probably just over optimizing. But I think the better question is, does the profile actually show this part as mattering a lot? If you need to cut down on time and your profile shows that a significant amount of time is spent on this sin call, do your own extra tricks (Taylor series around known values near where you are solving it for example). But profile first. If you’re looking to get a 20% increase on something that your code spends 2% of its time, you’re not actually helping anything.

Program the way that is appropriate for your problem, to the efficiency that you need. Anything more is problem dependent which makes it difficult to give any guidelines here.


#3

I would go further. It’s impossible to come up with a simple set of guidelines.

Writing this crossed my mind momentarily. But, I think the tone would have discouraged thinking about the problem. The post is already long, so I had to omit some things. Like, “Obviously, I am not asking for a set of rules that can be applied algorithmically across the board.”

No, this is not even close to universal.

Likewise, I looked here for a compact suggestion of a way of thinking about the problem. The sentence was not meant to be applied literally. I deliberately discarded qualifying phrases that crossed my mind.

Scaling doesn’t matter if you have a large constant

This is related to “not causing complaints”

People expect library code to be optimized …

Yes, this is exactly what I am talking about. By sin I mean the methods everyone get when they use sin(x). And merge! is more or less library code. There is no typical use case. But, obviously it doesn’t make sense to optimize all functions like merge! to the extent that sin is optimized— certainly not at this point. The question remains, which of the two implementations above of merge! is better? Better as library code for everyone. The answer is not too important (unless a specific use case arises). But, the reason for choosing one over the other is important.

I guess the people who are in the best position to think about guidelines for code in Base are the people who wrote it. And they have more pressing things to do.

For myself, I think the best route to understanding the question better is to read the code in ./base.


#4

This Wikipedia summary seems okay: https://en.wikipedia.org/wiki/Program_optimization#When_to_optimize


#5

Yes, this looks a great place to start. The important questions are not Julia specific… Its a big topic.

@ChrisRackauckas or maybe not. The article quotes Knuth

“In established engineering disciplines a 12% improvement, easily obtained, is never considered marginal and I believe the same viewpoint should prevail in software engineering”

The answer is 12%, not 50% :smiley:


#6

I think that a determined and experienced Julia programmer can achieve a 1.5-2x speedup (on average) on just about any piece of code that has not been micro-optimized. Yet most of us avoid this, because

  1. Julia programmers refactor a lot, and micro-optimized code with specific tricks is harder to maintain,
  2. the compiler may be able to do it later on automagically,
  3. specific assumptions about the compiler may not continue to hold, effectively make the code worse later on (this happens rarely).

My impression is that instead of micro-optimizing specific code, Julia has always tried to abstract out the necessary patterns and build them into the language (eg a prominent example is loop fusion, but there are many others). This is usually motivated by the recognition of the trade-offs between speed and code complexity, and the desire to avoid this by making the language smarter.

These language and compiler changes take time, but many of them are in development, or planned eventually. Usually, micro-optimization is deferred unless someone really needs the functionality and is willing to cope with the resulting code complexity. This is usually up to the judgement of the package maintaners.

An interesting exception is that frequently making the code more modular and elegant also makes it faster in Julia. This happens because the compiler copes really well with small functions that do one specific thing in a type stable way. Frequently I find that by making my code more readable, it got faster. This is almost too good to be true :wink: