Community Interest Check: LLMs from Scratch in Pure Julia

Palli · October 26, 2024, 9:00pm

It’s probably not better to go pure Julia (rather than use state-of-the-art code and algorithms), unless as a learning exercise. But if you do, consider “1-bit networks” (from 2023 and from this week):

https://arxiv.org/pdf/2410.16144

It’s very likely if you redo some software, you reimplement an outdated way. E.g. transformers are likely going away in current form.

We’ve likely gone to the end of the line with quantization with such 1-, 2-bit networks, and it helps keep the size down. To stay competitive with training you need thousands of GPUs, and software that can target so many, so it seems out of the question to use pure Julia. But maybe you can go half way there, leave out some parts like distributing to many GPUs, use DeepSpeed or something for that.

Training from scratch is still very costly, so no need to, since you can finetune a model for Julia use. But then you need to choose the best model to start from and formats/quantization as in llama.cpp or this new bitnet.cpp from Microsoft. See on the former (and relation to Llama2.jl):

KAN networks (they can be drop-in replacement into MLP part of transformers, if I recall) are worth-while to reimplement in Julia:

KAN networks are likely not compatible with 1-bit networks, I mean their weights larger, but might still be a good thing, if you get away with fewer. Also I think not intirely contradictory, since you can still have a transformer and other parts with 1-bit weights, where KAN is not replaceing the MLP part. But isn’t the MLP part the largest part of the total?

I think also worthwhile to help with this:

Best models will likely use new ways of multiplying not yet in software (but you could emulate slowly(?) for compatibility until hardware catches up, or maybe just use Float8, of bflot, I don’t recall, might be compatible with it):
https://arxiv.org/html/2410.00907v2#S2

Topic		Replies	Views
Sequence language models in Julia Machine Learning	3	213	June 29, 2025
LLM AI just for Julia? A proposal: Julia plus science LLM? General Usage machine-learning	4	1617	June 24, 2023
[ANN] Julia LLM Leaderboard - Help us make it more relevant for every day problems! Package Announcements announcement , generative-ai , prompting	22	3540	April 5, 2024
AI tools to write (Julia) code (best/worse experience), e.g. ChatGPT, GPT 3.5 Offtopic	62	15932	May 14, 2024
An LLM fine-tuned for Julia, call for comments + help Tooling llm , generative-ai	31	3364	July 16, 2025

Community Interest Check: LLMs from Scratch in Pure Julia

Related topics