Since we’re going to stick with LLVM 3.9 for Julia 1.0, CUDA support is falling behind a little. Not only is NVPTX still under heavy development, NVIDIA also regularly adds new intrinsics with new CUDA versions (eg. CUDA 9.0 adds synchronizing versions of many intrinsics which have to be used on Volta GPUs).
For now, I have been backporting only the LLVM features I really need, and try to work around other issues in CUDAnative.jl (eg. emit inline assembly using llvmcall). This is quite a hassle.
Any ideas for a better solution? I can’t expect users to build Julia with LLVM_VER=svn. Would it be OK to backport much larger patches, adding features from LLVM master to our 3.9 patch set? Even though they would be limited to the NVPTX back-end, it would eg. make it harder to support multiple LLVM versions (patch breakage, etc).
We had the most issue with the 3.3 → 3.7.1 upgrade but that’s just because of a major API change in LLVM. Nothing like that has happened after 3.7 yet and AFAICT the 3.7 → 3.9 upgrade was very smooth.
Unless there’s significantly more features added in the 6.0 cycle, I imaging that’s requires much less backporting effort. Both from the number of patches, and from how much other code have been changed.
Also, the thread about getting LLVM compilation performance good again happens in 3.9 time frame so I expect the compilation performance to not get worse as much after that. But in the end, since I’m not using the generic binary I guess I don’t care that much which LLVM is used by default… It mainly affect how much I want to help getting LLVM 5.0 working for use cases I don’t have.
Yes: we don’t have time to deal with the inevitable compile time regressions and other nasty compilation bugs that are the inevitable fallout from upgrading LLVM for 1.0. Upgrading LLVM to 5.0/6.0 (and all our other dependencies) can and should be a top priority in 1.1 since infrastructure upgrades have no user-visible impact and post-1.0 we have plenty of time to deal with the fallout from these kinds of changes.
Okay, so given that, any objections on back-porting some larger patches to our LLVM? Only dealing with the NVPTX back-end, of course. I was currently thinking of:
I’ve heard mixed story about Ryzen scheduler so I’m not sure what the difference actually is for workload we care about.
Not being able to use avx512 is LLVM <=4.0 bug that can’t be fixed without an upgrade.
For compile time I only know they started optimizing for it around/after 3.9 but I don’t find a lot of numbers. GCC 7.0 vs. LLVM Clang 4.0 Performance With Both Compiler Updates Coming Soon - Phoronix does show no clear regression in compile time with one clear improvement in 4.0. I can’t find a 5.0 compile time benchmark ATM.