Sequence language models in Julia

Dear All,

has anyone implemented / ported sequence models like Mamba, Hyena, S4 / S5 to Julia? I would be interested especially in already trained models.

Thanks a lot for answer in advance.
Tomas

2 Likes

I’m also interested in this.

Seems like Julia doesn’t have much going on in terms of training or inference for LLMs in general.

For example, “Julia language llama gpt” returns this thread from 2023: LLaMA in Julia? - #12 by ImreSamu. It mentions GitHub - cafaxo/Llama2.jl: Julia package for inference and training of Llama-style language models as a pure-Julia implementation of LLAMA. As of today, the last commit was a year ago (42 commits total), the repo has 140 stars. IMO this is extremely low for an LLM project given the current hype. I don’t think the package is “bad”, I think very few people know about it or are using it. Meanwhile PyTorch-based HuggingFace Transformers is widely used for running LLMs.

GitHub - rai-llc/LanguageModels.jl: Load nanoGPT-style transformers in Julia. Code ported from @karpathy's llama2.c was last active 2 years ago and has 61 stars. Meanwhile GitHub - ggml-org/llama.cpp: LLM inference in C/C++ has 82 thousand stars. Why is it C++ instead of Julia? This is more of a rhetorical question…

A substantial LLM ecosystem (GitHub - chengchingwen/Transformers.jl: Julia Implementation of Transformer models, GitHub - chengchingwen/BytePairEncoding.jl: Julia implementation of Byte Pair Encoding for NLP, GitHub - chengchingwen/NeuralAttentionlib.jl: Reusable functionality for defining custom attention/transformer layers. etc) is maintained by chengchingwen. The Transformers repo has a respectable 500+ stars, the rest - barely a hundred, even though their functionality is absolutely fundamental. BytePairEncoding.jl is basically Julia’s only alternative to tiktoken (GitHub - openai/tiktoken: tiktoken is a fast BPE tokeniser for use with OpenAI's models., 15k stars), why isn’t it extremely popular? I’m using it right now, it seems to be the only way of accessing ChatGPTs tokenizers locally from Julia. Why doesn’t it have the thousand stars it deserves? Maybe Julia programmers aren’t that much into LLMs? Why? “Everyone else” seems to be extremely excited about them.

GitHub - brian-j-smith/Mamba.jl: Markov chain Monte Carlo (MCMC) for Bayesian analysis in julia is a package for Bayesian statistics (last commit 5 years ago), not language models. I couldn’t find any Julia implementation of Mamba the language model architecture.

3 Likes

Thanks for answer.

I now play with jjama3 (GitHub - MurrellGroup/Jjama3.jl), which is nice small implementation. I am not sure how performant it is, but it good for my application now. But I wanted to try state-space models, because I want to take a gradient through the autoregressive generation of text, which might be easier with state space models.

In the world of LLM, we (I mean Julia community) lacks the critical mass.

2 Likes