Julia program obfuscation for commercial code

Hi there, I thought I rather not revive the 2 year old thread here on the topic.

I just wanted to share the solution which works for us. In our case, the julia code is called from a C/C++ program, so the simplest way I found to obfuscate the code is to simply encrypt the file.

Basically the process goes a bit like this (I could share sample code if people are interested):

Writing your software

  1. Write your julia program or module like you normally would and save the file as “.jl”
  2. Encrypt the file using your favourite method (in our case of simple obfuscation, a symmetric XOR cipher is used). Save that encrypted file to with a “.encryptedjl”

Packaging your software

  1. Write a small C/C++ program that reads the .encryptedjl file from disk, and decrypts it into a buffer in memory.
  2. Follow the instructions on embedding julia to execute the string (or functions).
  3. Compile the code into an application, linking with libjulia.so

Distributing your software

  1. Install julia on the target machine
  2. Distribute your program binary, as well as the .encryptedjl file

Updating the software
Basically since the julia installation and your julia program are separate from the compiled binary, they can be updated independantly. Simply copy over the new version of julia, or new version of the .encryptedjl file to the target machine to update the software.

8 Likes

What’s stopping me from building a libjulia.so that dumps the source code your C++ sends it ?

Nothing, but as the phrase goes… every piece of software is open source when you know enough assembly.

Probably for a lot of use cases this will work well enough. I’m sure that is the case for the OP company.

2 Likes

“Locks only keep honest people out”

6 Likes

This is the reason why I titled the topic with “obfuscation”. At first glance 2 things are hidden:

  1. The source code itself
  2. The mechanism used to reveal it
1 Like

I am interesting in your sample code.

I am also interested in the relative tradeoffs of this approach compared to using PackageCompiler to include the custom code into the Julia executable.

1 Like

Couldn’t you just use Base._require_from_serialized to load from the precompiled cache file, which doesn’t require the .jl source?

2 Likes

I’m not sure what your suggestion here is. How would you compile your .jl code and extract the precompiled version ?

Every package in Julia is precompiled by default the first time it is loaded (you may notice the “precompiling…” messages when you are developing the package). The corresponding cache file is stored as a .ji file under the ~/.julia directory, so you just need to find this and ship it, and write some custom loading machinery around Base._require_from_serialized.

Note that by default the precompiled files do embed the source code (useful in case it’s lost, it did happen to someone), but I think that’s optional and there should be a way to not embed the source code

1 Like

I think this would be very interesting. In our case the julia code still needs to be called from a C program, so I am not sure it would apply, but for a pure julia application it seems like it could work in theory.

I believe loading a .ji file isn’t really compatible with

if that is an important goal. (You can ship one .ji for each Julia version and select the right one for loading of course, but won’t help with future Julia versions.)

I don’t see why not. You write your custom .ji-loading mechanism in Julia, and call it from your C code.

Yes, that was added here: https://github.com/JuliaLang/julia/pull/23898. It looks like it would be easy to strip out of a .ji file without affecting anything but development tools (Revise, debuggers, …). In fact, the only place it (Base.read_dependency_src) is currently used seems to be in Revise, and read_dependency_src already handles the possibility of missing source code.

One problem with loading .ji files is that the dump format depends on the system ABI so it’s not very easy to get this to work for library packages.

This isn’t really a problem when distributing an application as a complete bundle though. In that case each bundle will require a system dependent build step anyway.


People interested in obfuscation should also try out the new --strip-metadata option which Jeff added fairly recently in

I think that would need to be combined with building your library into the sysimage. IIUC --strip-metadata removes all local variable names, so the Julia IR will become a great deal more difficult to understand.

Do beware that --strip-metadata is an internal undocumented compiler option at the moment so it might not be a complete feature (for example, does it strip all the source information added by #23898?)

2 Likes

The .ji file depends on the exact Julia version (and on the .ji files of any dependencies). But as @GunnarFarneback pointed out, if you want to ship a library that works with multiple Julia versions (not just a standalone binary) you could in principle ship multiple .ji files, one for each version of Julia, and have your installer select the right one.

you could in principle ship multiple .ji files, one for each version of Julia, and have your installer select the right one

Definitely! The main difficulty is setting up and maintaining the build infrastructure on all the necessary platforms. For a small number of platforms that shouldn’t be a big problem.

As promised, here is some C++ sample code running on linux with julia installed in the /opt/julia-1.7.1/ folder.

To use the sample code:

  1. Put all files in the same folder
  2. [if needed] Edit the makefile to point to your julia installation folder
  3. Open a command prompt and run “make”
  4. Encrypt your file by running “./obfuscator something.jl something.encryptedjl”
  5. Execute the encrypted file by running “./program something.encryptedjl”

Encryption

encryption.h
#include <string>
using namespace std;

void encrypt_decrypt(string &msg, string const& key);
encryption.cpp
#include "encryption.h"
using namespace std;

// simple message XOR cipher encryption suggestion from :
// http://www.cplusplus.com/forum/windows/128374/#msg694527
void encrypt_decrypt(string &msg, string const& key)
{
    for (string::size_type i = 0; i < msg.size(); ++i)
        msg[i] ^= key[i%key.size()];
}

Obfuscation program

obfuscator.cpp
#include <iostream>
#include <fstream>
#include <streambuf>

#include "encryption.h"
using namespace std;

const string encryption_key = "julia";

int main (int argc, char *argv[]) {
    if (argc < 2 || argc > 3) {
        cout << "Usage: obfuscator input_file [output_file]" << endl;
        return 1;
    }

    ifstream ifs(argv[1]);
    string fileContent( (istreambuf_iterator<char>(ifs)),
                      istreambuf_iterator<char>() );

    string outputFilename = 3 == argc ? string(argv[2]) : string(argv[1]) + ".encryptedjl";

    encrypt_decrypt(fileContent, encryption_key);

    ofstream ofs(outputFilename);
    ofs << fileContent;

    return 0;
}

C++ program

program.cpp
#include <iostream>
#include <fstream>
#include <streambuf>

#include "julia.h"
#include "encryption.h"
using namespace std;

const string encryption_key = "julia";

JULIA_DEFINE_FAST_TLS // only define this once, in an executable (not in a shared library) if you want fast code.

int main (int argc, char *argv[]) {
    if (argc != 2) {
        cout << "Usage: program input_file" << endl;
        return 1;
    }

    // required: setup the Julia context
    jl_init();
    
    // load up the julia file into memory
    ifstream filestream(argv[1]);
    string filecontent( (istreambuf_iterator<char>(filestream)), istreambuf_iterator<char>() );

    encrypt_decrypt(filecontent,encryption_key);
    
    jl_eval_string(filecontent.c_str());

    jl_atexit_hook(0);
    return 0;
}

Makefile

Makefile
CXX=/usr/bin/g++
JULIAINCPATH=/opt/julia-1.7.1/include/julia
JULIALIBPATH=/opt/julia-1.7.1/lib
CXXOPTS=-Wall -Werror
LDOPTS=-Wl,-rpath,$(JULIALIBPATH)
PATHS=-I$(JULIAINCPATH) -L$(JULIALIBPATH)
LIBS=-ljulia

all : obfuscator program

encryption.o : encryption.cpp encryption.h
	$(CXX) -c $(CXXOPTS) $< -o $@

obfuscator : obfuscator.cpp encryption.o
	$(CXX) $(CXXOPTS) $^ -o $@

program : program.cpp encryption.o
	$(CXX) $(CXXOPTS) $(PATHS) $(LDOPTS) $^ -o $@ $(LIBS)
7 Likes

Isn’t it that @lawless-m therewith mentions a good point?

That might have a grain of truth but if the effort/reward is is too high most attackers will give up. So what resources does the attacker have, and are they willing to spend them on your program?

I’ve always been impressed by the articles from Fabrice Desclaux and collaborators on reverse engineering the Skype binaries and protocol. For example Vanilla Skype part 1 is a great read.

7 Likes