LLVM patches for NVPTX

cudanative
llvm

#1

Since we’re going to stick with LLVM 3.9 for Julia 1.0, CUDA support is falling behind a little. Not only is NVPTX still under heavy development, NVIDIA also regularly adds new intrinsics with new CUDA versions (eg. CUDA 9.0 adds synchronizing versions of many intrinsics which have to be used on Volta GPUs).
For now, I have been backporting only the LLVM features I really need, and try to work around other issues in CUDAnative.jl (eg. emit inline assembly using llvmcall). This is quite a hassle.

Any ideas for a better solution? I can’t expect users to build Julia with LLVM_VER=svn. Would it be OK to backport much larger patches, adding features from LLVM master to our 3.9 patch set? Even though they would be limited to the NVPTX back-end, it would eg. make it harder to support multiple LLVM versions (patch breakage, etc).


#2

Are we?


#3

IIRC, that’s what I heard on the 1.0 roadmap talk, and didn’t @StefanKarpinski confirm that on Slack as well?

EDIT: https://youtu.be/qHpaztMu_Uw?t=428
but even then, the new sync intrinsics I mentioned only landed recently so would need to be back-ported to 5.0 too.


#4

Well, I don’t really remember “many” regressions (of any kind) for the 3.7.1 -> 3.9 updates. The only one that was attributed to it at some point that I can remember was https://github.com/JuliaLang/julia/issues/19976, which was not an LLVM regression. In fact, there’s still LLVM bugs that affects us that’ll be fixed with an upgrade like https://github.com/JuliaLang/julia/issues/14089

We had the most issue with the 3.3 -> 3.7.1 upgrade but that’s just because of a major API change in LLVM. Nothing like that has happened after 3.7 yet and AFAICT the 3.7 -> 3.9 upgrade was very smooth.

The only issue I can think of that is affecting us when upgrading to 5.0 is https://github.com/JuliaLang/julia/issues/23949, which is due to the change in how we use LLVM, not LLVM changes. That can easily be changed to match 3.9 behavior if needed.

If we don’t upgrade to LLVM 5.0, we’ll have

  1. No avx support on 32bit (I guess that’s not a big deal)
  2. No ryzen support
  3. No avx512 support

https://github.com/JuliaLang/julia/pull/21849 helps with 2 but still, not supporting the latest processor from both Intel and AMD months after their release looks pretty bad.

Unless there’s significantly more features added in the 6.0 cycle, I imaging that’s requires much less backporting effort. Both from the number of patches, and from how much other code have been changed.

Also, the thread about getting LLVM compilation performance good again happens in 3.9 time frame so I expect the compilation performance to not get worse as much after that. But in the end, since I’m not using the generic binary I guess I don’t care that much which LLVM is used by default… It mainly affect how much I want to help getting LLVM 5.0 working for use cases I don’t have.


#5

Yes: we don’t have time to deal with the inevitable compile time regressions and other nasty compilation bugs that are the inevitable fallout from upgrading LLVM for 1.0. Upgrading LLVM to 5.0/6.0 (and all our other dependencies) can and should be a top priority in 1.1 since infrastructure upgrades have no user-visible impact and post-1.0 we have plenty of time to deal with the fallout from these kinds of changes.


#6

Wow, no ryzen support is a serious drawback. So julia won’t run on this processor or just won’t support all optimizations/features?


#7

No optimization. No AVX512 support means avx512 instructions won’t be used.


#8

Okay, so given that, any objections on back-porting some larger patches to our LLVM? Only dealing with the NVPTX back-end, of course. I was currently thinking of:

Making those patches apply for all the LLVM versions we “support” (or, at least, 3.9 and 5.0) might be a bit of a hassle though.


#9

@yuyichao,

What will be improved if Julia would use newer LLVM?
Has they improved the efficiency of the code significantly?

Just curious.

Thank You.


#10

I’ve already listed all the changes I know of LLVM patches for NVPTX. As for how much the improvement/difference is.

I’ve heard mixed story about Ryzen scheduler so I’m not sure what the difference actually is for workload we care about.
Not being able to use avx512 is LLVM <=4.0 bug that can’t be fixed without an upgrade.
For compile time I only know they started optimizing for it around/after 3.9 but I don’t find a lot of numbers. https://www.phoronix.com/scan.php?page=article&item=gcc7-clang4-jan&num=4 does show no clear regression in compile time with one clear improvement in 4.0. I can’t find a 5.0 compile time benchmark ATM.