LLaMA in Julia?

LLaMA.cpp [1] has been getting a lot of attention on Hacker News [2] for its ability to run a Large Language Model (LLM) on any recent CPU with modest memory requirements. I’ve been meaning to get a better understanding of LLMs, so porting LLaMA over to Julia and being able to run it on my laptop seems like a good way to do that.

Has anyone else already started a similar project or have any thoughts? I briefly went through the C++ code and it looks fairly straightforward and a good fit for Julia from what I can tell.

[1] GitHub - ggerganov/llama.cpp: Port of Facebook's LLaMA model in C/C++
[2] Using LLaMA with M1 Mac and Python 3.11 | Hacker News

  • Plain C/C++ implementation without dependencies

ggml.c: 10502 lines (8554 loc) · 316 KB
GGML Tensor Library

Perhaps picoGPT would be simpler?

picoGPT is an unnecessarily tiny and minimal implementation of GPT-2 in plain NumPy. The entire forward pass code is 40 lines of code.