Compiling Julia using LTO+PGO

to what extent would this affect performance, also for julia, what part outside of benchmark framework
isn’t measured e.g. not only throughput, latency(time to first paint, hurts), RAM/power usage, how close is this to being setup/built

-may more be of use to release builds

  • PGO requires a generate run, (how to improve),
  • LTO has existed for years (less bugs,see gcc-6 changelog)

experiments (linux) much smaller binary sizes (dead code elimination, performance not much difference, can it be improved?)

  • sys.so 186.6→137.5MB
  • libjulia.so.1.4 32.3→5.6MB
    but some not affected
  • libLLVM-8jl.so 56.9→56.9MB
  • libopenblas64_.0.3.5.so 30.6→30.6MB

is there a better way than adding -fprofile-generate then -fprofile-use , -O3 -march=native -flto to (C,CXX,LD)FLAGS environment variables? + modern compiler (gcc8+),

Profiled (PGO) builds usually make use of a run that exercises the code for profiling (with representative coverage) (e.g. python compile has option)

non-expert, not much on @ certain programming (irrelevant?)

various links:
https://www.phoronix.com/scan.php?page=news_item&px=Fedora-32-LTO-Packages
https://www.phoronix.com/scan.php?page=news_item&px=Mesa-2020-PGO-LTO-Builds

some performance benchmarks:


image

why not/hinderances: (LTO seems to make debugging harder and PGO has to be compile twice (if profile genreation is unrepresentive there may be lack of improvement as result), thus ways to improve ease of use and only for release builds maybe esp for faster more emphasis on program that use up “CPU time”(effort)

but for people not building julia, a major point heard is the time-to-first-paint, would this have an impact, how much

notice LLVM parts don’t seem to be impacted, I don’t know enough about the build process to affect it (same size) is this of use ?.. https://github.com/facebookincubator/BOLT/blob/master/docs/OptimizingClang.md

1 Like