Hi!
I’m trying to wrap the Radeon collective communication library (RCCL), the ROCm counterpart to NCCL. I’m taking heavy inspiration from NCCL.jl regarding wrapping strategy since RCCL itself closely mimics NCCL. I have questions about the artifacts though: NCCL.jl depends on an NCCL_jll.jl and I can see that CUDA.jl also lets you choose between downloading artifacts or using the local toolkit. However, ROCm usually comes with RCCL bundled up with it and AMDGPU.jl already basically only uses local (non-jll) libraries. There is a RCCL repo on GitHub though. I guess my questions are:
Should I go the JLL route and create a script to put on Yggdrasil to build RCCL off their GitHub and wrap that one or do it the AMDGPU.jl-route and do pure local discovery?
If so, can someone explain me how the AMDGPU.jl discovery process & wrapping exactly works? My understanding is that the discovery.jl script has some functions to locate all the libraries like rocBLAS etc. and when the module is included the code inside __init__() exports the paths as global variables called librocblas, librocfft and so on. How do the various librocFFT.jl, librocBLAS.jl etc. know how to call those particular .so files? I don’t see any Libdl.dlopen() calls anywhere.
Sorry if these questions are basic, this is definitely harder than anything I’ve done so far in Julia and I’m hoping to learn a lot while doing it!
Welcome! This would definitely be nice to have, thanks for taking the initiative!
This will probably be a rather large undertaking. I looked into building ROCm on Yggdrasil before, but they have a very specific build setup and AFAIK, upstream doesn’t support cross-compilation. I’d go with local discovery for now.
Yes, that’s correct
They are simply passed directly to the ccall, for example here. You don’t usually need to call dlopen yourself.
Note that these wrappers are automatically generated using Clang.jl via the scripts located in AMDGPU.jl/gen at master · JuliaGPU/AMDGPU.jl · GitHub, so I would recommend a similar approach for NCCL.
Yes, I can imagine, what I was referring to was this repo which seems to be only RCCL. But it’s not clear how decoupled it actually is from the rest of ROCm so it makes a lot of sense to go with local discovery.
Aaah I see, so when I prepend e.g. librocfft to the @ccall function call, it is the macro that takes care to “open” that path and find the function inside. Thanks, it was this step that I was missing.
Yes, I am using Clang.jl and tried to get something as close to NCCL.jl as possible since even function names are the same. Thank you very much for your pointers. I’ll keep you posted about my progress!