Sequence language models in Julia

Tomas_Pevny · June 28, 2025, 8:07pm

Dear All,

has anyone implemented / ported sequence models like Mamba, Hyena, S4 / S5 to Julia? I would be interested especially in already trained models.

Thanks a lot for answer in advance.
Tomas

e3c6 · June 28, 2025, 9:48pm

I’m also interested in this.

ForceBru · June 28, 2025, 11:16pm

Seems like Julia doesn’t have much going on in terms of training or inference for LLMs in general.

For example, “Julia language llama gpt” returns this thread from 2023: LLaMA in Julia? - #12 by ImreSamu. It mentions GitHub - cafaxo/Llama2.jl: Julia package for inference and training of Llama-style language models as a pure-Julia implementation of LLAMA. As of today, the last commit was a year ago (42 commits total), the repo has 140 stars. IMO this is extremely low for an LLM project given the current hype. I don’t think the package is “bad”, I think very few people know about it or are using it. Meanwhile PyTorch-based HuggingFace Transformers is widely used for running LLMs.

GitHub - rai-llc/LanguageModels.jl: Load nanoGPT-style transformers in Julia. Code ported from @karpathy's llama2.c was last active 2 years ago and has 61 stars. Meanwhile GitHub - ggml-org/llama.cpp: LLM inference in C/C++ has 82 thousand stars. Why is it C++ instead of Julia? This is more of a rhetorical question…

A substantial LLM ecosystem (GitHub - chengchingwen/Transformers.jl: Julia Implementation of Transformer models, GitHub - chengchingwen/BytePairEncoding.jl: Julia implementation of Byte Pair Encoding for NLP, GitHub - chengchingwen/NeuralAttentionlib.jl: Reusable functionality for defining custom attention/transformer layers. etc) is maintained by chengchingwen. The Transformers repo has a respectable 500+ stars, the rest - barely a hundred, even though their functionality is absolutely fundamental. BytePairEncoding.jl is basically Julia’s only alternative to tiktoken (GitHub - openai/tiktoken: tiktoken is a fast BPE tokeniser for use with OpenAI's models., 15k stars), why isn’t it extremely popular? I’m using it right now, it seems to be the only way of accessing ChatGPTs tokenizers locally from Julia. Why doesn’t it have the thousand stars it deserves? Maybe Julia programmers aren’t that much into LLMs? Why? “Everyone else” seems to be extremely excited about them.

GitHub - brian-j-smith/Mamba.jl: Markov chain Monte Carlo (MCMC) for Bayesian analysis in julia is a package for Bayesian statistics (last commit 5 years ago), not language models. I couldn’t find any Julia implementation of Mamba the language model architecture.

Tomas_Pevny · June 29, 2025, 5:46am

Thanks for answer.

I now play with jjama3 (GitHub - MurrellGroup/Jjama3.jl), which is nice small implementation. I am not sure how performant it is, but it good for my application now. But I wanted to try state-space models, because I want to take a gradient through the autoregressive generation of text, which might be easier with state space models.

In the world of LLM, we (I mean Julia community) lacks the critical mass.

Palli · August 17, 2025, 10:19pm

FYI: HRM seems to be the sequence/LLM model you want implemented in Julia (or use from Python):

jling · August 17, 2025, 11:13pm

my 2cents on why nobody has done it in a way beyond toys and continue maintaining one – the DL ecosystem is constantly shifting.

Sure we can have a Julia CPU implementation, but what’s the point when llama.cpp implements the latest and greatest things so quickly:

llama : support Mamba Selective State Space Models by compilade · Pull Request #5328 · ggml-org/llama.cpp · GitHub

Then one might say, maybe the advantage in Julia is we will have one implementation that works also on GPU – too bad we’re still in the middle of Flux / Lux split.

Next idea, maybe we can have a very readable / explainable / hack-able implementation? Nope, Flux / Lux split again so it won’t be readable / hack-able because you will have to manually write a bunch of things because, for xample, many Transformer things won’t just work : Please support Lux.jl · Issue #170 · chengchingwen/Transformers.jl · GitHub

ericphanson · August 18, 2025, 1:26am

I think another reason is just the relative immaturity of the GPU stack. The GPU maintainers do an amazing job given how few of them there are, but it remains the case that if you were to do some very expensive training runs, something tried and true makes more sense. Eg Julia keeps allocations alive in presence of try/catch · Issue #2197 · JuliaGPU/CUDA.jl · GitHub has been open for years (and likely has plagued runs for years before- at least I have definitely seen bad GPU memory management in similar code where I was never able to track down the issue, prior to that one being filed). I have also seen the exact same model implemented in PyTorch and Julia where the PyTorch version was able to train with twice the batch size on the same hardware. That’s basically 2x faster runs and half the cost. And with PyTorch you get nice distributed data loaders “for free” to keep the model fed with very few lines of code, and all the other niceties of a very mature ecosystem. This was a year or two ago with Flux + CUDA + zygote, potentially Lux + enzyme + reactant behave better, I’m not sure.

Topic		Replies	Views
Open-source AI models, e.g. Vicuna close to ChatGPT; Julia applications Offtopic	4	1014	April 7, 2023
[ANN] Jjama3.jl (unregistered) - Llama3.1 and Llama3.2 (text) in Julia Package Announcements	9	756	December 6, 2024
LLaMA in Julia? Offtopic	13	3738	August 7, 2023
LLM AI just for Julia? A proposal: Julia plus science LLM? General Usage machine-learning	4	1660	June 24, 2023
Community Interest Check: LLMs from Scratch in Pure Julia Offtopic package	60	4746	September 24, 2025

Sequence language models in Julia

Related topics