Developing a Beginner's Roadmap to Learn Julia High Performance Computing for Data Science

Ah, so it’s basically applicable to anything you’d ever want to do in parallel in a distributed-memory context.

Reading/writing a big file? Use MPI_File_read_at and MPI_File_write_at to read and write in parallel (aside: MPI.jl generally follows the C api for MPI). Note of course that you have to know a fair amount about the structure of the file for this to work well, and you’ll want to be on a fast parallel (e.g. lustre, etc.) filesystem to not just get killed by IO bandwidth, but people do it with big HDF5 files on clusters all the time. Or, say, batch processing – each MPI task can just read a different file, then you do whatever you have to with that information. If you have more files then tasks, you can have one or more tasks work as a scheduler to coordinate the others.

Now those examples by themselves are almost too easy to count as HPC, but you could do it. If you’re going to do some sort of complicated calculation on that data though, then that can definitely count (e.g. Celeste.jl). Or say on top of that there were some degree of inter-process communication that you needed to coordinate (so, say, you needed to extract some different information from each file, share it with some neighboring tasks, and do something more complicated with that information), then you’re really going to be glad to be in an HPC environment with infiniband interconnects or whatnot, and will be glad to be managing message passing manually so that you only have to send the bare minimum, because even with that fancy interconnect, latency between nodes is huge compared to within the node.

It’s the hard limit on how much you can scale. If there is any part of your problem that is serial, Amdahl’s law tells you just how quickly that will come to dominate your runtime. Consequently, it can often tell you a priori whether you should even bother trying to scale a given code to 10 cores or 100 or 100,000.

2 Likes