AMDGPU.jl status

I’m beginning a fairly large project that is destined to run on a newly commissioned HPC cluster that will be using AMD’s Milan CPUs and MI250 GPUs. Once we have proven the algorithm works on the CPU it will be ported to the the MI250s. At the moment, whilst we have a prototype written in Julia, we’re still in an evaluation stage for our choice of tooling.

Julia is a super attractive language for our project, and makes so much of the coding experience straightforward and trivial, however I am concerned whether the AMDGPU.jl project is something we can rely on, i.e. is it healthy?

Is anyone who is part of the project feel they can speak to its overall health?

3 Likes

When developing HPC software you will often need to work with HPC engineers to get your software running and optimized for the target architecture. From my experience, here are many opportunities to discover quirks and bugs, in particular when you are targeting newly commissioned hardware.
This can be a major drain on your time, in particular if you are using unproven software like amdgpu.jl

On the other hand, this is a fantastic opportunity to test and develop amdgpu.jl.

I suggest you loop in your HPC support team into this discussion to see if they are willing to support you on this. It is their job to make your life as easy as possible on HPC, which should include supporting the tools you think are best for your research.

1 Like

As AMDGPU’s maintainer, I would say that it is ready to be used for many kinds of codes! There are some missing features that need implementing (mainly integration with ROCm external libraries), but the core is solid. @vchuravy has recently gotten the entire GPUArrays test suite passing on AMDGPU, so we know that basically all the core functionality is there and working.

@vchuravy and I both work at the JuliaLab on GPU computing, and we’ve got grants indirectly funding its development. We’ve got other contributors who are also looking to run their code on supercomputers sporting AMD GPUs (specifically Frontier), and we plan to work with them to ensure that AMDGPU.jl is up to the task.

I would go forward with writing your code in Julia, and if you run into any problems, please file issues on AMDGPU.jl’s repository. We’d be happy to help you get it working for your usecase!

25 Likes

@jpsamaroo thank you for your helpful reply! It’s also very helpful to know who to reach out to when we run into trouble. If you’re curious, this is to run in the Setonix cluster at the Pawsey Supercomputing Centre in Australia - were also very aware of our colleagues work at Frontier and hoping that can pave the way a bit for us.

I am not aware exactly on the state of Kernel Abstractions at the moment but You can start working on it using CPU and than switch to GPU without changing code just some configuration
Home · KernelAbstractions.jl (juliagpu.github.io)

Hello Julia devs! Here via hackernews. Do you have any of the ROCm developers contributing to this effort? If not I’d like to volunteer as such. I mostly work on openmp but that gives quite a lot of exposure to the HSA interface.

9 Likes

Hey there! We do not currently have any ROCm developers engaged for AMDGPU.jl development. We’d be happy to have more contributors helping out, and HSA experience is hugely welcome! Please feel free to contact me via PM here if you need any help getting up to speed or have any questions.

Cool, thanks! I’m good for now, have been vaguely watching the Julia project for a while and have a background in lisp and llvm. Will go and read through some of the amdgpu specific source.

I’ve been working at the sharp end of the HSA interface for a couple of years and know who to ask about things I haven’t seen before. Hopefully I can save people some time there.

9 Likes

Absolutely amazing to hear that someone with a lot of experience is helping the development of this package!