This state-of-the-art library has Rust and Python API, but ironically no (C++) high-level one despite written in C++:
I’m guessing C-like C++ is kinda necessary for bindings to other [languages]
[TensorFlow, also written in C++, only had officially stable Python API (while Julia’s API better until unmaintained).]
If we want to call this library then we can for sure, using PythonCall.jl, also an option to call Rust’s API. So which would you prefer?
You might think why not use:
- GGUF models: Llama 2, Llama 3, and Phi-3 (not all quantization variants may work)
- Andrej Karpathy’s llama2.c format
Note there, it’s only the format, Karpathy’s excellent llama2.c isn’t actually used, nor would it do if used.
llama.cpp provides all the GGUF and all the quantization types, and I’m not sure there’s any real alternative.
So why is Llama[2].jl being made from scratch in pure Julia? It’s great that you can, but not really needed, or even wanted? I think we should reuse great code.
I think Julia could well be the high-level API for end-users (but also for other languages, i.e. replacing C++ as implementation language).
Until then, should we rather be wrapping Rust; or Python (in general, not just for this)?
Wrapping C++ is possible but famously annoying, and doesn’t really matter if you can or not if programs (increasingly?) do not provide C++ API. Rust also has some issues (or just this one solvable?), it can rearrange structs (good for it, for performance, bad for languages wrapping Rust, then you must forbid it, which is a possibility for “C-like API”; though in most cases not need? You just wrap API, not expose structs).