AMDGPU.jl status

I’m beginning a fairly large project that is destined to run on a newly commissioned HPC cluster that will be using AMD’s Milan CPUs and MI250 GPUs. Once we have proven the algorithm works on the CPU it will be ported to the the MI250s. At the moment, whilst we have a prototype written in Julia, we’re still in an evaluation stage for our choice of tooling.

Julia is a super attractive language for our project, and makes so much of the coding experience straightforward and trivial, however I am concerned whether the AMDGPU.jl project is something we can rely on, i.e. is it healthy?

Is anyone who is part of the project feel they can speak to its overall health?

7 Likes

When developing HPC software you will often need to work with HPC engineers to get your software running and optimized for the target architecture. From my experience, here are many opportunities to discover quirks and bugs, in particular when you are targeting newly commissioned hardware.
This can be a major drain on your time, in particular if you are using unproven software like amdgpu.jl

On the other hand, this is a fantastic opportunity to test and develop amdgpu.jl.

I suggest you loop in your HPC support team into this discussion to see if they are willing to support you on this. It is their job to make your life as easy as possible on HPC, which should include supporting the tools you think are best for your research.

1 Like

As AMDGPU’s maintainer, I would say that it is ready to be used for many kinds of codes! There are some missing features that need implementing (mainly integration with ROCm external libraries), but the core is solid. @vchuravy has recently gotten the entire GPUArrays test suite passing on AMDGPU, so we know that basically all the core functionality is there and working.

@vchuravy and I both work at the JuliaLab on GPU computing, and we’ve got grants indirectly funding its development. We’ve got other contributors who are also looking to run their code on supercomputers sporting AMD GPUs (specifically Frontier), and we plan to work with them to ensure that AMDGPU.jl is up to the task.

I would go forward with writing your code in Julia, and if you run into any problems, please file issues on AMDGPU.jl’s repository. We’d be happy to help you get it working for your usecase!

30 Likes

@jpsamaroo thank you for your helpful reply! It’s also very helpful to know who to reach out to when we run into trouble. If you’re curious, this is to run in the Setonix cluster at the Pawsey Supercomputing Centre in Australia - were also very aware of our colleagues work at Frontier and hoping that can pave the way a bit for us.

1 Like

I am not aware exactly on the state of Kernel Abstractions at the moment but You can start working on it using CPU and than switch to GPU without changing code just some configuration
Home · KernelAbstractions.jl (juliagpu.github.io)

3 Likes

Hello Julia devs! Here via hackernews. Do you have any of the ROCm developers contributing to this effort? If not I’d like to volunteer as such. I mostly work on openmp but that gives quite a lot of exposure to the HSA interface.

11 Likes

Hey there! We do not currently have any ROCm developers engaged for AMDGPU.jl development. We’d be happy to have more contributors helping out, and HSA experience is hugely welcome! Please feel free to contact me via PM here if you need any help getting up to speed or have any questions.

1 Like

Cool, thanks! I’m good for now, have been vaguely watching the Julia project for a while and have a background in lisp and llvm. Will go and read through some of the amdgpu specific source.

I’ve been working at the sharp end of the HSA interface for a couple of years and know who to ask about things I haven’t seen before. Hopefully I can save people some time there.

11 Likes

Absolutely amazing to hear that someone with a lot of experience is helping the development of this package!

1 Like

Hi, I’m a Pytorch developer and newbie Julia developer. I manage a small team that does 3D graphics and ML applied research. I think Julia has a bright potential (despite still rough edges) and I want to help. I’m trying to promote Julia usage within AMD, so I’m in touch with Jon and a dozen other developers who have some Julia experience or strong interest. I’m not in charge of the Compute/HPC side of things, and I don’t speak officially on behalf of AMD - I’m just sharing my personal opinions.

I can’t make any promise yet, but I’m trying to make a case with AMD execs that we should start getting involved in the Julia ecosystem, i.e. have engineers dedicated at least part-time to helping make Julia+AMD work reliably and easily, and produce fast code. I am still at the stage where I’m gathering inputs and ideas, to make the business case. I’d like to arrange a conf call between interested parties, e.g. @vchuravy, @jpsamaroo. If you’d like to offer input and/or be considered for later conf calls, please email me at (my user name on discourse) at amd.com, and let me know what time zone you’re in, what your interests are, etc… or just post your ideas and concerns in this thread.

49 Likes

It’s a year later, and @claforte I wonder how things went?

(I had a couple meetings with an AMD sales guy here in Perth, but he seemed very uninterested in entertaining my concerns regarding AMD taking an active role in Julia GPU work.)

1 Like

Adding my own input (@pxl-th may also have some input as well):

The Julia 3D Neural Graphics project previously headed by @claforte has been, and continues to be, a big success. The team has implemented a large number of algorithms, from Nerf to diffrast, along with a nice GUI in GitHub - JuliaNeuralGraphics/NerfGUI.jl. We still have a performance bridge to cross, as CUDA.jl outperforms AMDGPU.jl on these codes with similarly-performing GPUs, but most of the issues are known and we’re making good progress.

On the side of interest in using Julia within AMD, I personally haven’t heard much else, which is unfortunate. But I also think it’s still quite early to pass judgement, as the biggest users of AMDGPU.jl are on very new supercomputers, and we’re just now at the point where we’re going to be presenting results and retrospectives at conferences (PASC23 being a notable example). I think once this year’s conferences are concluded, we should start getting more positive responses from AMD as they realize that people are actually doing Real Science at Exascale with Julia :smile:

9 Likes