Sequence language models in Julia

Dear All,

has anyone implemented / ported sequence models like Mamba, Hyena, S4 / S5 to Julia? I would be interested especially in already trained models.

Thanks a lot for answer in advance.
Tomas

4 Likes

I’m also interested in this.

Seems like Julia doesn’t have much going on in terms of training or inference for LLMs in general.

For example, “Julia language llama gpt” returns this thread from 2023: LLaMA in Julia? - #12 by ImreSamu. It mentions GitHub - cafaxo/Llama2.jl: Julia package for inference and training of Llama-style language models as a pure-Julia implementation of LLAMA. As of today, the last commit was a year ago (42 commits total), the repo has 140 stars. IMO this is extremely low for an LLM project given the current hype. I don’t think the package is “bad”, I think very few people know about it or are using it. Meanwhile PyTorch-based HuggingFace Transformers is widely used for running LLMs.

GitHub - rai-llc/LanguageModels.jl: Load nanoGPT-style transformers in Julia. Code ported from @karpathy's llama2.c was last active 2 years ago and has 61 stars. Meanwhile GitHub - ggml-org/llama.cpp: LLM inference in C/C++ has 82 thousand stars. Why is it C++ instead of Julia? This is more of a rhetorical question…

A substantial LLM ecosystem (GitHub - chengchingwen/Transformers.jl: Julia Implementation of Transformer models, GitHub - chengchingwen/BytePairEncoding.jl: Julia implementation of Byte Pair Encoding for NLP, GitHub - chengchingwen/NeuralAttentionlib.jl: Reusable functionality for defining custom attention/transformer layers. etc) is maintained by chengchingwen. The Transformers repo has a respectable 500+ stars, the rest - barely a hundred, even though their functionality is absolutely fundamental. BytePairEncoding.jl is basically Julia’s only alternative to tiktoken (GitHub - openai/tiktoken: tiktoken is a fast BPE tokeniser for use with OpenAI's models., 15k stars), why isn’t it extremely popular? I’m using it right now, it seems to be the only way of accessing ChatGPTs tokenizers locally from Julia. Why doesn’t it have the thousand stars it deserves? Maybe Julia programmers aren’t that much into LLMs? Why? “Everyone else” seems to be extremely excited about them.

GitHub - brian-j-smith/Mamba.jl: Markov chain Monte Carlo (MCMC) for Bayesian analysis in julia is a package for Bayesian statistics (last commit 5 years ago), not language models. I couldn’t find any Julia implementation of Mamba the language model architecture.

5 Likes

Thanks for answer.

I now play with jjama3 (GitHub - MurrellGroup/Jjama3.jl), which is nice small implementation. I am not sure how performant it is, but it good for my application now. But I wanted to try state-space models, because I want to take a gradient through the autoregressive generation of text, which might be easier with state space models.

In the world of LLM, we (I mean Julia community) lacks the critical mass.

3 Likes

FYI: HRM seems to be the sequence/LLM model you want implemented in Julia (or use from Python):

1 Like

my 2cents on why nobody has done it in a way beyond toys and continue maintaining one – the DL ecosystem is constantly shifting.

Sure we can have a Julia CPU implementation, but what’s the point when llama.cpp implements the latest and greatest things so quickly:

Then one might say, maybe the advantage in Julia is we will have one implementation that works also on GPU – too bad we’re still in the middle of Flux / Lux split.

Next idea, maybe we can have a very readable / explainable / hack-able implementation? Nope, Flux / Lux split again so it won’t be readable / hack-able because you will have to manually write a bunch of things because, for xample, many Transformer things won’t just work :tm: : Please support Lux.jl · Issue #170 · chengchingwen/Transformers.jl · GitHub

1 Like

I think another reason is just the relative immaturity of the GPU stack. The GPU maintainers do an amazing job given how few of them there are, but it remains the case that if you were to do some very expensive training runs, something tried and true makes more sense. Eg Julia keeps allocations alive in presence of try/catch · Issue #2197 · JuliaGPU/CUDA.jl · GitHub has been open for years (and likely has plagued runs for years before- at least I have definitely seen bad GPU memory management in similar code where I was never able to track down the issue, prior to that one being filed). I have also seen the exact same model implemented in PyTorch and Julia where the PyTorch version was able to train with twice the batch size on the same hardware. That’s basically 2x faster runs and half the cost. And with PyTorch you get nice distributed data loaders “for free” to keep the model fed with very few lines of code, and all the other niceties of a very mature ecosystem. This was a year or two ago with Flux + CUDA + zygote, potentially Lux + enzyme + reactant behave better, I’m not sure.

4 Likes