I started training a language model from scratch in Julia. Not using pre-built libraries for the core - building my own BPE tokenizer, my own training loop, facing hallucinations, and rebuilding.
I tried Flux and Lux. Both have strengths, but also critical weaknesses (CUDA conflicts, design limitations). After a long struggle, I found a different path that worked.
PythonCall played a key role, bridging Julia to Python’s ecosystem when needed. But Julia itself was the heart of the project.
What I learned is that Julia is not just “another language”. It is a platform for real understanding. If the community focuses on its unique strengths (speed, metaprogramming, Python interop), I believe Julia can surpass many expectations.
I am not sharing technical details now. But I wanted to confirm: Julia is inspiring. With more work, it can become much, much more.
“I am sharing a screenshot as a proof of concept. The full code is not open-source at this stage. I want to document it properly first. I may share it later. I hope you understand and respect that.”
#machinelearning
When I first came across Julia a few weeks ago, it really is my dream come true! So many language features and design that I adore (I once tried designing an intuitive language myself, and turns out Julia is literally what I was looking for the whole time). I could confirm Julia is so inspiring, and in my opinion at least, Julia is THE language (well, Julia is unironically a cult lol).
Please consider sharing the code, would be really interested taking a look. This discourse is one of the best ways to learn and share julia tips and tricks.
I would be specially interested looking at the BPE tokenizer from scratch.
Thank you for your interest. I understand that the Discourse is for learning, and I appreciate that.
However, as I mentioned in the original post, the code is not open-source at this stage. I am still documenting it and refining it.
Regarding the BPE tokenizer: I built it from scratch after studying multiple implementations. The general approach is standard (Byte-Pair Encoding), but my specific adaptation for Arabic text and the model’s vocabulary is what makes it unique.
I am not sharing the code yet, but I am happy to discuss the algorithm or the challenges I faced. What specifically would you like to know about BPE in Julia?