My journey training an LLM from scratch in Julia (and why I see huge potential)

I started training a language model from scratch in Julia. Not using pre-built libraries for the core - building my own BPE tokenizer, my own training loop, facing hallucinations, and rebuilding.

I tried Flux and Lux. Both have strengths, but also critical weaknesses (CUDA conflicts, design limitations). After a long struggle, I found a different path that worked.

PythonCall played a key role, bridging Julia to Python’s ecosystem when needed. But Julia itself was the heart of the project.

What I learned is that Julia is not just “another language”. It is a platform for real understanding. If the community focuses on its unique strengths (speed, metaprogramming, Python interop), I believe Julia can surpass many expectations.

I am not sharing technical details now. But I wanted to confirm: Julia is inspiring. With more work, it can become much, much more.
“I am sharing a screenshot as a proof of concept. The full code is not open-source at this stage. I want to document it properly first. I may share it later. I hope you understand and respect that.”
#machinelearning

When I first came across Julia a few weeks ago, it really is my dream come true! So many language features and design that I adore (I once tried designing an intuitive language myself, and turns out Julia is literally what I was looking for the whole time). I could confirm Julia is so inspiring, and in my opinion at least, Julia is THE language (well, Julia is unironically a cult lol).

Please consider sharing the code, would be really interested taking a look. This discourse is one of the best ways to learn and share julia tips and tricks.

I would be specially interested looking at the BPE tokenizer from scratch.

Thank you for your interest. I understand that the Discourse is for learning, and I appreciate that.

However, as I mentioned in the original post, the code is not open-source at this stage. I am still documenting it and refining it.

Regarding the BPE tokenizer: I built it from scratch after studying multiple implementations. The general approach is standard (Byte-Pair Encoding), but my specific adaptation for Arabic text and the model’s vocabulary is what makes it unique.

I am not sharing the code yet, but I am happy to discuss the algorithm or the challenges I faced. What specifically would you like to know about BPE in Julia?

confimation using julia

good luck in your dream