Hi there, I thought I rather not revive the 2 year old thread here on the topic.
I just wanted to share the solution which works for us. In our case, the julia code is called from a C/C++ program, so the simplest way I found to obfuscate the code is to simply encrypt the file.
Basically the process goes a bit like this (I could share sample code if people are interested):
Writing your software
Write your julia program or module like you normally would and save the file as “.jl”
Encrypt the file using your favourite method (in our case of simple obfuscation, a symmetric XOR cipher is used). Save that encrypted file to with a “.encryptedjl”
Packaging your software
Write a small C/C++ program that reads the .encryptedjl file from disk, and decrypts it into a buffer in memory.
Compile the code into an application, linking with libjulia.so
Distributing your software
Install julia on the target machine
Distribute your program binary, as well as the .encryptedjl file
Updating the software
Basically since the julia installation and your julia program are separate from the compiled binary, they can be updated independantly. Simply copy over the new version of julia, or new version of the .encryptedjl file to the target machine to update the software.
I am also interested in the relative tradeoffs of this approach compared to using PackageCompiler to include the custom code into the Julia executable.
Every package in Julia is precompiled by default the first time it is loaded (you may notice the “precompiling…” messages when you are developing the package). The corresponding cache file is stored as a .ji file under the ~/.julia directory, so you just need to find this and ship it, and write some custom loading machinery around Base._require_from_serialized.
Note that by default the precompiled files do embed the source code (useful in case it’s lost, it did happen to someone), but I think that’s optional and there should be a way to not embed the source code
I think this would be very interesting. In our case the julia code still needs to be called from a C program, so I am not sure it would apply, but for a pure julia application it seems like it could work in theory.
I believe loading a .ji file isn’t really compatible with
if that is an important goal. (You can ship one .ji for each Julia version and select the right one for loading of course, but won’t help with future Julia versions.)
Yes, that was added here: https://github.com/JuliaLang/julia/pull/23898. It looks like it would be easy to strip out of a .ji file without affecting anything but development tools (Revise, debuggers, …). In fact, the only place it (Base.read_dependency_src) is currently used seems to be in Revise, and read_dependency_src already handles the possibility of missing source code.
One problem with loading .ji files is that the dump format depends on the system ABI so it’s not very easy to get this to work for library packages.
This isn’t really a problem when distributing an application as a complete bundle though. In that case each bundle will require a system dependent build step anyway.
People interested in obfuscation should also try out the new --strip-metadata option which Jeff added fairly recently in
I think that would need to be combined with building your library into the sysimage. IIUC --strip-metadata removes all local variable names, so the Julia IR will become a great deal more difficult to understand.
Do beware that --strip-metadata is an internal undocumented compiler option at the moment so it might not be a complete feature (for example, does it strip all the source information added by #23898?)
The .ji file depends on the exact Julia version (and on the .ji files of any dependencies). But as @GunnarFarneback pointed out, if you want to ship a library that works with multiple Julia versions (not just a standalone binary) you could in principle ship multiple .ji files, one for each version of Julia, and have your installer select the right one.
you could in principle ship multiple .ji files, one for each version of Julia, and have your installer select the right one
Definitely! The main difficulty is setting up and maintaining the build infrastructure on all the necessary platforms. For a small number of platforms that shouldn’t be a big problem.
#include <iostream>
#include <fstream>
#include <streambuf>
#include "julia.h"
#include "encryption.h"
using namespace std;
const string encryption_key = "julia";
JULIA_DEFINE_FAST_TLS // only define this once, in an executable (not in a shared library) if you want fast code.
int main (int argc, char *argv[]) {
if (argc != 2) {
cout << "Usage: program input_file" << endl;
return 1;
}
// required: setup the Julia context
jl_init();
// load up the julia file into memory
ifstream filestream(argv[1]);
string filecontent( (istreambuf_iterator<char>(filestream)), istreambuf_iterator<char>() );
encrypt_decrypt(filecontent,encryption_key);
jl_eval_string(filecontent.c_str());
jl_atexit_hook(0);
return 0;
}
That might have a grain of truth but if the effort/reward is is too high most attackers will give up. So what resources does the attacker have, and are they willing to spend them on your program?
I’ve always been impressed by the articles from Fabrice Desclaux and collaborators on reverse engineering the Skype binaries and protocol. For example Vanilla Skype part 1 is a great read.