LLVM patches for NVPTX

maleadt · October 4, 2017, 11:37am

Since we’re going to stick with LLVM 3.9 for Julia 1.0, CUDA support is falling behind a little. Not only is NVPTX still under heavy development, NVIDIA also regularly adds new intrinsics with new CUDA versions (eg. CUDA 9.0 adds synchronizing versions of many intrinsics which have to be used on Volta GPUs).
For now, I have been backporting only the LLVM features I really need, and try to work around other issues in CUDAnative.jl (eg. emit inline assembly using llvmcall). This is quite a hassle.

Any ideas for a better solution? I can’t expect users to build Julia with LLVM_VER=svn. Would it be OK to backport much larger patches, adding features from LLVM master to our 3.9 patch set? Even though they would be limited to the NVPTX back-end, it would eg. make it harder to support multiple LLVM versions (patch breakage, etc).

yuyichao · October 4, 2017, 11:58am

Are we?

maleadt · October 4, 2017, 12:21pm

IIRC, that’s what I heard on the 1.0 roadmap talk, and didn’t @StefanKarpinski confirm that on Slack as well?

EDIT: JuliaCon 2017 | Julia Roadmap | Stefan Karpinski - YouTube
but even then, the new sync intrinsics I mentioned only landed recently so would need to be back-ported to 5.0 too.

yuyichao · October 4, 2017, 3:08pm

Well, I don’t really remember “many” regressions (of any kind) for the 3.7.1 → 3.9 updates. The only one that was attributed to it at some point that I can remember was LLVM generates bad code on newer architercures · Issue #19976 · JuliaLang/julia · GitHub, which was not an LLVM regression. In fact, there’s still LLVM bugs that affects us that’ll be fixed with an upgrade like floating point `div` differs in optimized code · Issue #14089 · JuliaLang/julia · GitHub

We had the most issue with the 3.3 → 3.7.1 upgrade but that’s just because of a major API change in LLVM. Nothing like that has happened after 3.7 yet and AFAICT the 3.7 → 3.9 upgrade was very smooth.

The only issue I can think of that is affecting us when upgrading to 5.0 is GC frame lowering doesn't handle vector well · Issue #23949 · JuliaLang/julia · GitHub, which is due to the change in how we use LLVM, not LLVM changes. That can easily be changed to match 3.9 behavior if needed.

If we don’t upgrade to LLVM 5.0, we’ll have

No avx support on 32bit (I guess that’s not a big deal)
No ryzen support
No avx512 support

https://github.com/JuliaLang/julia/pull/21849 helps with 2 but still, not supporting the latest processor from both Intel and AMD months after their release looks pretty bad.

Unless there’s significantly more features added in the 6.0 cycle, I imaging that’s requires much less backporting effort. Both from the number of patches, and from how much other code have been changed.

Also, the thread about getting LLVM compilation performance good again happens in 3.9 time frame so I expect the compilation performance to not get worse as much after that. But in the end, since I’m not using the generic binary I guess I don’t care that much which LLVM is used by default… It mainly affect how much I want to help getting LLVM 5.0 working for use cases I don’t have.

StefanKarpinski · October 4, 2017, 4:40pm

Yes: we don’t have time to deal with the inevitable compile time regressions and other nasty compilation bugs that are the inevitable fallout from upgrading LLVM for 1.0. Upgrading LLVM to 5.0/6.0 (and all our other dependencies) can and should be a top priority in 1.1 since infrastructure upgrades have no user-visible impact and post-1.0 we have plenty of time to deal with the fallout from these kinds of changes.

wrgr · October 4, 2017, 6:40pm

Wow, no ryzen support is a serious drawback. So julia won’t run on this processor or just won’t support all optimizations/features?

yuyichao · October 4, 2017, 7:29pm

No optimization. No AVX512 support means avx512 instructions won’t be used.

maleadt · October 5, 2017, 5:47am

Okay, so given that, any objections on back-porting some larger patches to our LLVM? Only dealing with the NVPTX back-end, of course. I was currently thinking of:

⚙ D38191 [NVPTX] added match.{any,all}.sync instructions, intrinsics & builtins.
⚙ D38148 [NVPTX] Implemented bar.warp.sync, barrier.sync, and vote{.sync} instructions/intrinsics/builtins.
⚙ D38090 [NVPTX] Implemented shfl.sync instruction and supporting intrinsics/builtins.

Making those patches apply for all the LLVM versions we “support” (or, at least, 3.9 and 5.0) might be a bit of a hassle though.

RoyiAvital · October 5, 2017, 9:33am

@yuyichao,

What will be improved if Julia would use newer LLVM?
Has they improved the efficiency of the code significantly?

Just curious.

Thank You.

yuyichao · October 5, 2017, 12:15pm

I’ve already listed all the changes I know of LLVM patches for NVPTX - #4 by yuyichao. As for how much the improvement/difference is.

I’ve heard mixed story about Ryzen scheduler so I’m not sure what the difference actually is for workload we care about.
Not being able to use avx512 is LLVM <=4.0 bug that can’t be fixed without an upgrade.
For compile time I only know they started optimizing for it around/after 3.9 but I don’t find a lot of numbers. GCC 7.0 vs. LLVM Clang 4.0 Performance With Both Compiler Updates Coming Soon - Phoronix does show no clear regression in compile time with one clear improvement in 4.0. I can’t find a 5.0 compile time benchmark ATM.

Topic		Replies	Views
Julia LLVM issue New to Julia question , llvm	6	2511	September 19, 2017
Enabling other backends for Julia Internals & Design question	19	3056	March 23, 2017
Plans to move to LLVM 6.0? Internals & Design	3	1821	March 9, 2018
Intel SYCL and LLVM Community	1	954	October 18, 2019
What does Julia GPU today? GPU question	6	3171	November 25, 2016

LLVM patches for NVPTX

Related topics