Developing a Beginner's Roadmap to Learn Julia High Performance Computing for Data Science

blackeneth · July 4, 2021, 3:51pm

One way to proceed is to copy successful training from others. One of the best is fast.ai, which is produced and given away for free by Jeremy Howard. If you listen to his interview on the Lex Friedman podcast, he makes a couple of observations:

most analysts don’t process huge datasets that require networks of computers
most analysts are just working on their single workstation with a single GPU
deep learning is the most advanced technique available and it’s not too hard to create state of the art models today

The very first exercise has you train a model to recognize cat pictures (historical fun fact: the internet was built to share cat pictures).

The course currently uses Python. I believe he is also developing a swift version. There is also an effort to create a Julia version. See also forum post at fast.ai

So, one thing you could do would be to contribute to that.

Besides that, I think a “traditional” introduction do deep learning would start with tree models (CART), boosted trees, random forest, and support vector machines. All of those and more are available in the MLJ toolbox. One approach – a good one – would be to write tutorials that walk people through the MLJ toolbox. Now you might expect that has been done – and it has: Data Science Tutorials in Julia. You could add to that in either breadth or depth. Or, you might find the tutorials too advanced and create simpler baby-steps tutorials for absolute beginners in data science. You need to define your audience, find you niche, and go for it!

Topic		Replies	Views
HPC / Julia, MPI / big data Julia at Scale	15	1504	October 13, 2020
Can Julia efficiently make use of 20+ cores for transforming hundreds of millions of rows for machine learning? Machine Learning question , big-data	27	3001	December 1, 2020
Struggling with Julia and large datasets General Usage question , big-data	67	11137	October 17, 2024
How to choose a workstation for optimal performance Offtopic question , hardware	51	5221	November 13, 2021
Distributed.jl vs MPI.jl Performance question , package , mpi , distributed	26	6543	January 31, 2022

Developing a Beginner's Roadmap to Learn Julia High Performance Computing for Data Science

Related topics