Celeste is an interesting case because they don’t actually have an explicit dependency on MPI.jl, and it’s a bit tricky to figure out exactly what they were doing from looking at the source, but they do mention in interviews
we have integrated the DTree scheduler and utilized MPI-3 one-sided communication primitives
so they must have been hooking in to the cluster’s MPI at some level.
What’s a processing pipeline? It’s a bunch of transformations and side-effects, like storing data into S3.
Data process is done by retrieval of data from somewhere e.g. DBMS, HDFS, etc. Then processed by something, which is often a Spark CLUSTER. So replace that Spark cluster with a single computer. So everything can happen in RAM. E.g. replace it with a robust tool. Myabe a single node Spark? Or even Dask, disk.frame, vaex or let the user choose.
There are technologies now which make large memory machines available and very fast access to data over networks, and composable infrastructure
Intel Optane persistent memory is a tier below RAM.
Dell can provide multi-terabyte single servers using persistent memory. I would have to check what the limits are.
You can have composable infrastrucures - where you build a high performance server on demand with a set of CPUs, GPUs and high memory - which can be torn down and redistributed for the next trainign or analysis run.
Dell work very closely with Liqid on this - have a look at this 12 Terabyte server
that’s amazing! Althought it’s 12TB of Optane so it’s a bit slower than DDRRAM but faster than SSDs. Optane is definitely what it is to SSD as what SSDs were to spinning hard drives.
you make it sound like the specialist architecture is ubiquitous of which it is not. Most spark cluster suffer from poor network performance vs fetching from disk. It will be many years before it can be considered “out-dated”.
@xiaodai Servers have a mic of DRAM and Optane memory - not all Optane. So depending on how much data you access at a time performance may not be affected highly. As usual YMMV
This is a subject that I am not expert in, however if there is interest I Can track down someone from Intel to comment.