I’m developing a package (announcement in the coming months!) targeted to run on my university’s supercomputer, but due to funding & priority maintenance limitations, the “operating environment” is out of date; more specifically, it’s a ~2014 POWER8 system running Ubuntu 16.04 on the PowerPC architecture with NVIDIA P100s running CUDA 8. So, as we’ve found so far, the Julia 1.7.0-rc2 binary is the most recent version that seems to work. With that in mind, what are some good workflow tips for developing packages targeted at older Julia versions? For example, some dependencies could include CUDA.jl, Adapt.jl, and ClusterManagers.jl.
Is a 2014 supercomputer actually going to be faster than a decent somewhat modern AMD server? Multi-node overhead seems unlikely to be worth it for 10 year old hardware.
Oh absolutely, though I do believe that, while a single RTX 4090 GPU has better single precision performance than the 12 P100s in this system combined, the P100s have better double precision performance. Really the goal of the project is to make this package so that it’s ready to use on any* HPC platform. In other words, I’m using this supercomputer as a testing ground to develop something that can be dropped into something like what’s at a national lab. Additionally, this process will allow other folks to run it locallly at their own “maintenance limited” institutions.
I would develop normally with the newest release and add a CI test with Julia 1.7. That’s how we normally keep compatibility with older Julia versions.
Do you have a clue of how long this system will be up for? I only see PowerPC clusters being retired nowadays (like Summit).
If I were you I would try to get access to a more modern cluster. It’s often easier than one might think.
Where are you located? Perhaps we can point you somewhere.
Thank you for your offer, but honestly, I am not currently interested in the actual performance of the machine. As I mentioned above:
Additionally, I wanted to ask the question to collect resources and ideas to develop on hardware that doesn’t support the more recent versions of the underlying software such as LLVM; this could possibly even extend to embedded systems due to their high constraints, but I’m much less familiar with those cases.