Optimize Julia for size, and other potential or actual switches

Julia has

-O, --optimize={0,1,2*,3} Set the optimization level (level 3 if -O is used without a level) ($)

I.e. -O2 is the default (unlike in e.g. C/C++), does anyone use -O3 aka -O, or the other options often?

It seems like -Os is missing, and while the other optimization options do not map to C/C++ compilers such as clang or GCC the options in Julia might be similar to Clang (and GCC):

man clang

Code Generation Options
-O0, -O1, -O2, -O3, -Ofast, -Os, -Oz, -Og, -O, -O4
Specify which optimization level to use:
-O0 Means “no optimization”: this level compiles the fastest and generates the most debuggable code.

[That’s NOT true for Julia -O0 does some, I believe just one optimization, const prop if I recall, and it’s cheap and always a win, so I think ok.]

-O2 Moderate level of optimization which enables most optimizations.

-O3 Like -O2, except that it enables optimizations that take longer to perform or that may generate larger code (in an attempt to make the program run faster).

I believe the same is true for Julia for -O2, and likely -O3 though I o not know exactly what it adds for Julia, nor for C/C++. But this option is passed to LLVM, Julia’s compiler backend, then same as used in clang, so maybe it does the same in Julia.

It’s meant for faster code, that could be larger, potentially smaller too, not sure, there’s not always a correlation, in fact some optimizations make code larger, e.g. loop unrolling, already done with -O2.

-Ofast Enables all the optimizations from -O3 along with other aggressive optimizations that may violate strict compliance with language standards.

[Not available in Julia as a global option, and that’s probably for the best, I think all of this may be available as local options.]

What I’m most intrigued about in Clang, are -Os and the more aggressive -Oz, to know if people use such often with Clang and GCC, and why not supported in Julia (could it be done easily, just pass it to LLVM?):

-Os Like -O2 with extra optimizations to reduce code size.

-Oz Like -Os (and thus -O2), but reduces code size further.

-Og Like -O1. In future versions, this option might disable different optimizations in order to improve debuggability.

I wouldn’t expect any miracles with -Os or -Oz if available in Julia, though similar code reduction as possible as for C and C++. Note, you can compile Julia code to packages AND optimize the size there. I.e. the most worrying size is the large overhead of Julia’s stdlib sizes. E.g. LinearAlgebra and OpenBLAS and more that can be dropped. Even the larger LLVM, but not always.

man gcc shows similar options, is specific about what they mean exactly, and note would not apply to Julia, since compiler specific, though some might:

-O3 Optimize yet more. -O3 turns on all optimizations specified by -O2 and also turns on the following optimization flags:

       -fgcse-after-reload -fipa-cp-clone -floop-interchange -floop-unroll-and-jam -fpeel-loops -fpredictive-commoning -fsplit-loops -fsplit-paths -ftree-loop-distribution -ftree-loop-vectorize -ftree-partial-pre
       -ftree-slp-vectorize -funswitch-loops -fvect-cost-model -fvect-cost-model=dynamic -fversion-loops-for-strides

   -O0 Reduce compilation time and make debugging produce the expected results.  This is the default.

   -Os Optimize for size.  -Os enables all -O2 optimizations except those that often increase code size:

       -falign-functions  -falign-jumps -falign-labels  -falign-loops -fprefetch-loop-arrays  -freorder-blocks-algorithm=stc

       It also enables -finline-functions, causes the compiler to tune for code size rather than execution speed, and performs further optimizations designed to reduce code size.

[that’s intriguing I would have thought -Os to NOT inline. There are many other inline options available e.g. -minline-all-stringops -minline-stringops-dynamically]

       Disregard strict standards compliance.  -Ofast enables all -O3 optimizations.  It also enables optimizations that are not valid for all standard-compliant programs.  It turns on -ffast-math, -fallow-store-data-races and
       the Fortran-specific -fstack-arrays, unless -fmax-stack-var-size is specified, and -fno-protect-parens.

You can turn inlining on or off globally (and locally in code), in Julia and it’s on by default, potentially faster, though not always, and is somewhat problematic, I believe recompilations are because it’s used.

Inlining is off in -O0 in Julia and GCC, and GCC has a restricted form -finline-functions-called-once which is one of the optimizations used for -O1. Inlining more than that means code-expansion, and this would be an interesting option for Julia if -O1 doesn’t do that currently.

GCC has -finline-functions -finline-small-functions and BOTH are on for -O2, which is strange, since I would have thought the former applying for large and small.

Some more interesting I see:

By default, GCC limits the size of functions that can be inlined. This flag allows coarse control of this limit. n is the size of functions that can be inlined in number of pseudo instructions.

Inline parts of functions. This option has any effect only when inlining itself is turned on by the -finline-functions or -finline-small-functions options.

       Enabled at levels -O2, -O3, -Os.
       To save space, do not emit out-of-line copies of inline functions controlled by "#pragma implementation".  This causes linker errors if these functions are not inlined everywhere they are called.

Warn if a function that is declared as inline cannot be inlined. Even with this option, the compiler does not warn about failures to inline functions declared in system headers.

       The compiler uses a variety of heuristics to determine whether or not to inline a function.  For example, the compiler takes into account the size of the function being inlined and the amount of inlining that has already
       been done in the current function.  Therefore, seemingly insignificant changes in the source program can cause the warnings produced by -Winline to appear or disappear.

Do not recognize “asm”, “inline” or “typeof” as a keyword, so that code can use these words as identifiers. You can use the keywords “asm”, “inline

       Don't emit code for implicit instantiations of inline templates, either.  The default is to handle inlines differently so that compiles with and without optimization need the same set of explicit instantiations.

Intriguing, is this wrong on macOS/Darwin, but only for GCC, not Julia (or Clang):

Do not set “errno” after calling math functions that are executed with a single instruction, e.g., “sqrt”. A program that relies on IEEE exceptions for math error handling may want to use this flag for speed while
maintaining IEEE arithmetic compatibility.

       This option is not turned on by any -O option since it can result in incorrect output for programs that depend on an exact implementation of IEEE or ISO rules/specifications for math functions. It may, however, yield
       faster code for programs that do not require the guarantees of these specifications.

       The default is -fmath-errno.

       On Darwin systems, the math library never sets "errno".  There is therefore no reason for the compiler to consider the possibility that it might, and -fno-math-errno is the default.

[Julia doesn’t not emit only one instruction for sqrt, but it’s possible, that faster option is in a package. I’m not proposing such extreme code-size optimization for -Os or -Oz.]

Allow the reciprocal of a value to be used instead of dividing by the value if this enables optimizations. For example “x / y” can be replaced with “x * (1/y)”, which is useful if “(1/y)” is subject to common
subexpression elimination. Note that this loses precision and increases the number of flops operating on the value.

       The default is -fno-reciprocal-math.

Same in Julia.

The option -fno-gnu89-inline explicitly tells GCC to use the C99 semantics for “inline” when in C99 or gnu99 mode (i.e., it specifies the default behavior).