Community Interest Check: LLMs from Scratch in Pure Julia

That is so true :smiling_face_with_tear:

I understand that @Palli 's point is to overtake others on a curve with the rise of some new structures. That may be hopeful. But a more healthy approach is what @ToucheSir suggests here. So that we can prepare well for the next boom.

Based on my experience, there are still tons of missing components to train a LLM with from scratch in pure Julia. (But don’t take me wrong, I’m still optimistic :wink: Personally I think the most important issue right now is that top tier GPUs are still too expensive and not that easily accessible to most people. But in the long run, I believe the price will go down and more developers will realize that they deserve a better programming language or framework. I’m talking about Megatron :rage: )

I got some bandwidth recently and tried to implement many deep generative models (VAE, GAN, VQGAN, LLAMA, MoE, DDPM) from scratch with Lux.jl Honestly speaking, I really like its design. But to move on to models of large size across multiple nodes, we still have a lot of work to do. (And I’m not sure if it is worth working in this direction right now)

A more practical roadmap from my point is probably:

  • Single card inference
  • Multi card(node) inference
  • Single card training
  • Multi card training
  • Multi node training

And it would be great to have one public repo for people to report the benchmark results so that we can know what others have achieved until now.

3 Likes