LLaMA in Julia?

robsmith11 · March 13, 2023, 1:02am

LLaMA.cpp [1] has been getting a lot of attention on Hacker News [2] for its ability to run a Large Language Model (LLM) on any recent CPU with modest memory requirements. I’ve been meaning to get a better understanding of LLMs, so porting LLaMA over to Julia and being able to run it on my laptop seems like a good way to do that.

Has anyone else already started a similar project or have any thoughts? I briefly went through the C++ code and it looks fairly straightforward and a good fit for Julia from what I can tell.

[1] GitHub - ggerganov/llama.cpp: Port of Facebook's LLaMA model in C/C++
[2] Using LLaMA with M1 Mac and Python 3.11 | Hacker News

woclass · March 13, 2023, 6:18am

Plain C/C++ implementation without dependencies

ggml.c: 10502 lines (8554 loc) · 316 KB
GGML Tensor Library

Perhaps picoGPT would be simpler?

picoGPT is an unnecessarily tiny and minimal implementation of GPT-2 in plain NumPy. The entire forward pass code is 40 lines of code.

Acey · July 22, 2023, 8:55pm

any updates on this? I am also interested in using Julia for LLMs.

robsmith11 · July 26, 2023, 2:44am

I haven’t had time to do anything, but looks like someone is working on a project in julia:

Acey · July 26, 2023, 12:21pm

thank you - I have seen it!

GunnarFarneback · July 26, 2023, 12:23pm

I believe this implementation is currently in better shape: https://gist.github.com/cafaxo/83b795f11c645e217688449dd91147a3

lostella · July 29, 2023, 5:42am

This has been moved to GitHub - cafaxo/Llama2.jl

sivark · August 5, 2023, 6:22am

Some recent stuff from Jiahao Chen: GitHub - rai-llc/LanguageModels.jl: Load nanoGPT-style transformers in Julia. Code ported from @karpathy's llama2.c

Tomas_Pevny · August 5, 2023, 10:54am

This is great endeavour. Please, do not take next comments as criticism, but as a suggestions of what to do next.

but on the end, the question will be speed. How does the implementation compares to that in llama.c?
Also, I have recently the pleasure the use Transformers.jl and was impressed how the package is. It implements the python counterpart with Julia flexibility, which is just nice. Unfortunately, it is about half the performance (speed) of Julia. So the next question would be, how the tiny implementation of llama compares to the implementation with Transformers.jl, which might not exist yet. I think this is important questions, because these tiny libraries are incredible for showing versatility, Transformers should be go-to package for LLMs and we should learn tricks from these small packages to improve it.

Private confession: for some experiments, I had to use the python counterpart, because Falcon class of models use tokenizer which is not supported in Julia.

cafaxo · August 6, 2023, 11:43am

I am the author of Llama2.jl. It currently runs the Llama2 7B model (q4_K_S GGML quantization) at 9 tokens/second (it slows down to about 7 tokens/second as it approaches sequence length 512).
llama.cpp runs the same model at about 14 tokens/second. (all on an M1 Air)
As far as I know, llama2.c does not support loading GGML weights at all yet.

Tomas_Pevny · August 7, 2023, 5:18am

Why there is about half speed comparing to the C version? What is their secret sauce they use?

I wanted to try to load LLama2 to transformers.jl, but I got the reply GatedRepo, because I have not accepted the license.

ImreSamu · August 7, 2023, 7:32am

-Ofast -march=native ....

GitHub - karpathy/llama2.c: Inference Llama 2 in one file of pure C

cafaxo · August 7, 2023, 8:05am

I was comparing speed against llama.cpp, not llama2.c.
I do not yet know what causes the difference.

ImreSamu · August 7, 2023, 8:43am

cflags ?

    OPT = -Ofast

	CFLAGS   += -march=native -mtune=native
	CXXFLAGS += -march=native -mtune=native

github.com

ggerganov/llama.cpp/blob/3d9a55181603e85a26378a850a14068034e5002d/Makefile#L133-L134


      
          	CFLAGS   += -march=native -mtune=native
          	CXXFLAGS += -march=native -mtune=native

Topic		Replies	Views
LlaMa2 architecture in Julia: llama2.jl (300 lines?) General Usage	2	548	August 1, 2023
Gradient of llama2 computed by Zygote seems to be incorrect Machine Learning	6	766	September 13, 2023
[ANN] Jjama3.jl (unregistered) - Llama3.1 and Llama3.2 (text) in Julia Package Announcements	9	744	December 6, 2024
Generate independent LLVM code from Julia General Usage question , llvm	0	684	April 5, 2017
OpenAL GPT Machine Learning question	0	800	March 22, 2019

LLaMA in Julia?

Related topics