Developing a Beginner's Roadmap to Learn Julia High Performance Computing for Data Science

brenhinkeller · July 4, 2021, 7:45pm

Celeste is an interesting case because they don’t actually have an explicit dependency on MPI.jl, and it’s a bit tricky to figure out exactly what they were doing from looking at the source, but they do mention in interviews

we have integrated the DTree scheduler and utilized MPI-3 one-sided communication primitives

so they must have been hooking in to the cluster’s MPI at some level.

ImreSamu · July 4, 2021, 8:27pm

/ ~ data cubes / Multidimensional Tiling / ~ partitioning/splitting the data by dimensions

for example, we have only 3 Dimensions:

Date/Time ( ~ like splitting by Month )
Spatial ( Quadtiles ; H3 ; S2 )
Genus / Species / Taxon

And each cell has a ~ 0…10000 trees
So we can split the data with small “data tiles” ( cubes )

so just generate a big tasklist to a ./bio_tasklist.sh

./julia_bio_task.sh 2021-01 ADBA Tilia
./julia_bio_task.sh 2021-01 ADBA Populus
./julia_bio_task.sh 2021-01 ADBA Fraxinus
./julia_bio_task.sh 2021-02 ADBA Tilia
./julia_bio_task.sh 2021-02 ADBA Populus
./julia_bio_task.sh 2021-02 ADBA Fraxinus
...

And run parallel: the "–jobs $(nproc) "== parallel tasks based on cpu counts.

time parallel --delay 2 --jobs $(nproc) --results ./jobs/bio_tasks  -k  < ./bio_tasklist.sh

see:

In the OSM there are some biological data - if you need

xiaodai · July 5, 2021, 12:47am

What’s a processing pipeline? It’s a bunch of transformations and side-effects, like storing data into S3.

Data process is done by retrieval of data from somewhere e.g. DBMS, HDFS, etc. Then processed by something, which is often a Spark CLUSTER. So replace that Spark cluster with a single computer. So everything can happen in RAM. E.g. replace it with a robust tool. Myabe a single node Spark? Or even Dask, disk.frame, vaex or let the user choose.

johnh · July 5, 2021, 6:19am

There are technologies now which make large memory machines available and very fast access to data over networks, and composable infrastructure
Intel Optane persistent memory is a tier below RAM.

Dell can provide multi-terabyte single servers using persistent memory. I would have to check what the limits are.

johnh · July 5, 2021, 6:21am

You can have composable infrastrucures - where you build a high performance server on demand with a set of CPUs, GPUs and high memory - which can be torn down and redistributed for the next trainign or analysis run.
Dell work very closely with Liqid on this - have a look at this 12 Terabyte server

https://www.liqid.com/dell-technologies/solution-bundles/application/liqid-composable-high-memory-appliance

Also look at Bluefield

johnh · July 5, 2021, 6:29am

Comments about time to access data from network storage versus local storage on a server are getting a bit out of date.

Look at the Data Accelerator at the Dell Centre of Excellence in Cambridge

https://www.dell.com/support/kbdoc/en-us/000122853/dell-emc-data-accelerator-reference-architecture

Weka storage say they are FASTER than local disk

Also look at Intel DAOS
https://www.intel.co.uk/content/www/uk/en/high-performance-computing/daos-high-performance-storage-brief.html

brenhinkeller · July 5, 2021, 6:32am

Oh so that’s what they’re using Optane for! Yeah, makes sense.

xiaodai · July 5, 2021, 7:32am

that’s amazing! Althought it’s 12TB of Optane so it’s a bit slower than DDRRAM but faster than SSDs. Optane is definitely what it is to SSD as what SSDs were to spinning hard drives.

xiaodai · July 5, 2021, 7:46am

you make it sound like the specialist architecture is ubiquitous of which it is not. Most spark cluster suffer from poor network performance vs fetching from disk. It will be many years before it can be considered “out-dated”.

johnh · July 5, 2021, 12:59pm

@xiaodai Servers have a mic of DRAM and Optane memory - not all Optane. So depending on how much data you access at a time performance may not be affected highly. As usual YMMV

This is a subject that I am not expert in, however if there is interest I Can track down someone from Intel to comment.

xiaodai · July 5, 2021, 11:53pm

Of course, all computers need DRAM to function due to the current architecture. So you mentioned optane only so I pointed to the optane part.

Topic		Replies	Views
HPC / Julia, MPI / big data Julia at Scale	15	1529	October 13, 2020
Can Julia efficiently make use of 20+ cores for transforming hundreds of millions of rows for machine learning? Machine Learning question , big-data	27	3047	December 1, 2020
Questions on a number of code acceleration techniques General Usage performance , hpc , parallel	11	1796	July 8, 2017
State of distributed processing in Julia Julia at Scale	3	1648	May 14, 2019
How to choose a workstation for optimal performance Offtopic question , hardware	51	5311	November 13, 2021

Developing a Beginner's Roadmap to Learn Julia High Performance Computing for Data Science

Related topics