Julia program obfuscation for commercial code

maharvey · January 25, 2022, 2:31pm

Hi there, I thought I rather not revive the 2 year old thread here on the topic.

I just wanted to share the solution which works for us. In our case, the julia code is called from a C/C++ program, so the simplest way I found to obfuscate the code is to simply encrypt the file.

Basically the process goes a bit like this (I could share sample code if people are interested):

Writing your software

Write your julia program or module like you normally would and save the file as “.jl”
Encrypt the file using your favourite method (in our case of simple obfuscation, a symmetric XOR cipher is used). Save that encrypted file to with a “.encryptedjl”

Packaging your software

Write a small C/C++ program that reads the .encryptedjl file from disk, and decrypts it into a buffer in memory.
Follow the instructions on embedding julia to execute the string (or functions).
Compile the code into an application, linking with libjulia.so

Distributing your software

Install julia on the target machine
Distribute your program binary, as well as the .encryptedjl file

Updating the software
Basically since the julia installation and your julia program are separate from the compiled binary, they can be updated independantly. Simply copy over the new version of julia, or new version of the .encryptedjl file to the target machine to update the software.

lawless-m · January 25, 2022, 2:56pm

What’s stopping me from building a libjulia.so that dumps the source code your C++ sends it ?

suavesito · January 25, 2022, 3:03pm

Nothing, but as the phrase goes… every piece of software is open source when you know enough assembly.

Probably for a lot of use cases this will work well enough. I’m sure that is the case for the OP company.

lawless-m · January 25, 2022, 3:04pm

“Locks only keep honest people out”

maharvey · January 25, 2022, 3:08pm

This is the reason why I titled the topic with “obfuscation”. At first glance 2 things are hidden:

The source code itself
The mechanism used to reveal it

Jake · January 25, 2022, 3:32pm

I am interesting in your sample code.

I am also interested in the relative tradeoffs of this approach compared to using PackageCompiler to include the custom code into the Julia executable.

stevengj · January 25, 2022, 3:52pm

Couldn’t you just use Base._require_from_serialized to load from the precompiled cache file, which doesn’t require the .jl source?

maharvey · January 25, 2022, 3:55pm

I’m not sure what your suggestion here is. How would you compile your .jl code and extract the precompiled version ?

stevengj · January 25, 2022, 3:58pm

Every package in Julia is precompiled by default the first time it is loaded (you may notice the “precompiling…” messages when you are developing the package). The corresponding cache file is stored as a .ji file under the ~/.julia directory, so you just need to find this and ship it, and write some custom loading machinery around Base._require_from_serialized.

giordano · January 25, 2022, 3:59pm

Note that by default the precompiled files do embed the source code (useful in case it’s lost, it did happen to someone), but I think that’s optional and there should be a way to not embed the source code

maharvey · January 25, 2022, 4:01pm

I think this would be very interesting. In our case the julia code still needs to be called from a C program, so I am not sure it would apply, but for a pure julia application it seems like it could work in theory.

GunnarFarneback · January 25, 2022, 4:06pm

I believe loading a .ji file isn’t really compatible with

if that is an important goal. (You can ship one .ji for each Julia version and select the right one for loading of course, but won’t help with future Julia versions.)

stevengj · January 26, 2022, 12:10am

I don’t see why not. You write your custom .ji-loading mechanism in Julia, and call it from your C code.

stevengj · January 26, 2022, 12:13am

Yes, that was added here: https://github.com/JuliaLang/julia/pull/23898. It looks like it would be easy to strip out of a .ji file without affecting anything but development tools (Revise, debuggers, …). In fact, the only place it (Base.read_dependency_src) is currently used seems to be in Revise, and read_dependency_src already handles the possibility of missing source code.

c42f · January 26, 2022, 12:50am

One problem with loading .ji files is that the dump format depends on the system ABI so it’s not very easy to get this to work for library packages.

This isn’t really a problem when distributing an application as a complete bundle though. In that case each bundle will require a system dependent build step anyway.

People interested in obfuscation should also try out the new --strip-metadata option which Jeff added fairly recently in

I think that would need to be combined with building your library into the sysimage. IIUC --strip-metadata removes all local variable names, so the Julia IR will become a great deal more difficult to understand.

Do beware that --strip-metadata is an internal undocumented compiler option at the moment so it might not be a complete feature (for example, does it strip all the source information added by #23898?)

stevengj · January 26, 2022, 1:20am

The .ji file depends on the exact Julia version (and on the .ji files of any dependencies). But as @GunnarFarneback pointed out, if you want to ship a library that works with multiple Julia versions (not just a standalone binary) you could in principle ship multiple .ji files, one for each version of Julia, and have your installer select the right one.

c42f · January 26, 2022, 3:03am

you could in principle ship multiple .ji files, one for each version of Julia, and have your installer select the right one

Definitely! The main difficulty is setting up and maintaining the build infrastructure on all the necessary platforms. For a small number of platforms that shouldn’t be a big problem.

maharvey · January 26, 2022, 3:05pm

As promised, here is some C++ sample code running on linux with julia installed in the /opt/julia-1.7.1/ folder.

To use the sample code:

Put all files in the same folder
[if needed] Edit the makefile to point to your julia installation folder
Open a command prompt and run “make”
Encrypt your file by running “./obfuscator something.jl something.encryptedjl”
Execute the encrypted file by running “./program something.encryptedjl”

Encryption

encryption.h

#include <string>
using namespace std;

void encrypt_decrypt(string &msg, string const& key);

encryption.cpp

#include "encryption.h"
using namespace std;

// simple message XOR cipher encryption suggestion from :
// http://www.cplusplus.com/forum/windows/128374/#msg694527
void encrypt_decrypt(string &msg, string const& key)
{
    for (string::size_type i = 0; i < msg.size(); ++i)
        msg[i] ^= key[i%key.size()];
}

Obfuscation program

obfuscator.cpp

#include <iostream>
#include <fstream>
#include <streambuf>

#include "encryption.h"
using namespace std;

const string encryption_key = "julia";

int main (int argc, char *argv[]) {
    if (argc < 2 || argc > 3) {
        cout << "Usage: obfuscator input_file [output_file]" << endl;
        return 1;
    }

    ifstream ifs(argv[1]);
    string fileContent( (istreambuf_iterator<char>(ifs)),
                      istreambuf_iterator<char>() );

    string outputFilename = 3 == argc ? string(argv[2]) : string(argv[1]) + ".encryptedjl";

    encrypt_decrypt(fileContent, encryption_key);

    ofstream ofs(outputFilename);
    ofs << fileContent;

    return 0;
}

C++ program

program.cpp

#include <iostream>
#include <fstream>
#include <streambuf>

#include "julia.h"
#include "encryption.h"
using namespace std;

const string encryption_key = "julia";

JULIA_DEFINE_FAST_TLS // only define this once, in an executable (not in a shared library) if you want fast code.

int main (int argc, char *argv[]) {
    if (argc != 2) {
        cout << "Usage: program input_file" << endl;
        return 1;
    }

    // required: setup the Julia context
    jl_init();
    
    // load up the julia file into memory
    ifstream filestream(argv[1]);
    string filecontent( (istreambuf_iterator<char>(filestream)), istreambuf_iterator<char>() );

    encrypt_decrypt(filecontent,encryption_key);
    
    jl_eval_string(filecontent.c_str());

    jl_atexit_hook(0);
    return 0;
}

Makefile

Makefile

CXX=/usr/bin/g++
JULIAINCPATH=/opt/julia-1.7.1/include/julia
JULIALIBPATH=/opt/julia-1.7.1/lib
CXXOPTS=-Wall -Werror
LDOPTS=-Wl,-rpath,$(JULIALIBPATH)
PATHS=-I$(JULIAINCPATH) -L$(JULIALIBPATH)
LIBS=-ljulia

all : obfuscator program

encryption.o : encryption.cpp encryption.h
	$(CXX) -c $(CXXOPTS) $< -o $@

obfuscator : obfuscator.cpp encryption.o
	$(CXX) $(CXXOPTS) $^ -o $@

program : program.cpp encryption.o
	$(CXX) $(CXXOPTS) $(PATHS) $(LDOPTS) $^ -o $@ $(LIBS)

Minumsand · January 28, 2022, 8:49am

Isn’t it that @lawless-m therewith mentions a good point?

c42f · January 28, 2022, 9:45am

That might have a grain of truth but if the effort/reward is is too high most attackers will give up. So what resources does the attacker have, and are they willing to spend them on your program?

I’ve always been impressed by the articles from Fabrice Desclaux and collaborators on reverse engineering the Skype binaries and protocol. For example Vanilla Skype part 1 is a great read.

Topic		Replies	Views
How to build closed-source julia package? General Usage question , package , sysimage	1	1357	February 23, 2022
Commercial codes using Julia - code obfuscation? General Usage cryptography	6	4819	July 18, 2018
Can the developer encrypt their source code? General Usage question , encrypt	2	187	July 5, 2024
Hidden code New to Julia	26	5309	December 15, 2020
Obfuscate Julia module and import it Python General Usage module , python , package-compiler	9	2079	January 17, 2022

Julia program obfuscation for commercial code

Related topics