I have trained a model in Python/Keras. What is the recommended way with the present Julia ML ecosystem to save it to disk and load it in Julia to do inference (no training) ?
A similar question was asked in 2020. One answer was “export the model to onnx”. Is this still a valid answer ? There are also recent similar questions but with PyTorch.
loading and saving object in Python Pickle and Torch Pickle format.
[…]
We also support loading/saving the tensor data from/for pytorch. […]
From the name of the package, it’s not obvious it has anything to do with neural networks, until you read further. Note, Pickle is Python’s general serialization format, and is not meant to be read by other languages, nor even to be used for all in Python, e.g. a potential security issue (for e.g. neural networks).
I don’t know in your specific case, in might not use Pickle, could though potentially be converted to. If you read such a file with Pickle.jl then there’s no security issue, since it doesn’t support all of pickle (and likely never will), but I understand enough for (some) neural networks.
You can actually read all Pickle files, and all Keras files by calling to CPython (i.e. using say PythonCall.jl), but then it’s not Julia, or Julia code doing it, only istructing Python to do it, and thus all that Python can do supported. Also the inference, but again you’re just using Python and Keras from Julia.
Some older I found, maybe also helpful:
That might still be true, or outdated. Do you know for sure contradicting the above, or didn’t know of it? I’m not sure it’s announced yet.
I should have mentioned that the results of the inference would be combined with other computations for which I have Julia code. This is why am not considering a pure Python solution. But I am open to do the inference via PyCall if that is a robust approach.
If I save the model to Pickle I see how to load it but how would I use it ?
I have found FluxML/ONNX.jl . Based on an example, this seems to work despite the statement “ONNX.jl is in the process of a total reconstruction”.
You might want to rather use PythonCall.jl, and it should be very similar to do.
Regarding ONNX, is it still much used? I’m trying to find out what are the most needed file formats for Julia to support, and I’m not sure we even yet have the have the final best format.
What I think is most important now is the new GGUF file format (and older compatible GGML which seems much used):
GGUF and GGML are file formats used for storing models for inference, particularly in the context of language models like GPT (Generative Pre-trained Transformer). Let’s break down the key differences, pros, and cons of each:
[…]
GGUF (GPT-Generated Unified Format)
[…]
Successor to GGML: GGUF aims to address the limitations of GGML and improve the overall user experience. *
No breaking changes: GGUF seeks to eliminate breaking changes, making it easier for users to transition to new versions.
Support for various models: GGUF is not limited to llama models, making it more versatile.
Cons:
It may take some time for existing models to be converted to the GGUF format.
Users and developers need to adapt to this new format.
WHy cant both be supported ?
It’s a lot of effort to maintain support for older file formats, since the current version of llama.cpp only supports GGUF. GGUF is intended to be a long-term solution that can easily be extended, so it should be the last new format for a long time.
There is a script in llama.cpp that you can use to convert LLaMA models to the new format, called ‘convert-llama-ggml-to-gguf.py’.