Obfuscate Julia module and import it Python

Hi everybody,
I am intending to write a Julia module or a main function that performs a particular task (connects to a device and performs numerical calculations). I would like to import this Julia module in Python and write a GUI around it for better user experience (i don’t want to write the GUI in Julia for various reasons).

Is there a way to obfuscate the Julia code before importing it in Python?

It helps me to distribute the code without divulging its source code.
I am thinking along the way of using PackageCompiler in Julia to my benefit, but i don’t really know how to exploit it for my purpose.
Thanks a lot!
Regards, Ajay

Previous posts:

2 Likes

As of 1.8, add `--strip-metadata` option by JeffBezanson · Pull Request #42513 · JuliaLang/julia · GitHub and add `--strip-ir` option by JeffBezanson · Pull Request #42925 · JuliaLang/julia · GitHub should help significantly.

3 Likes

Thanks for reply. Its indeed good news.

Hi, do i understand correctly? Will we be able to compile Julia modules to pure binaries from version 1.8 onwards?

1.8 will not allow you to compile separate binaries for different modules and link them together. what it will do is allow users make PackageCompiler apps that are significantly smaller.

2 Likes

Could you please answer my original question in the context of Julia 1.8 (using compiled Julia module in Python)? Any example code somewhere or documentation? Thanks a lot in advance!

It seems you can already do this with Julia 1.8-DEV, but there’s one caveat.

The two PRs linked in the thread here where merged in Nov. or earlier, so should be in the nightly/1.8. So you can test the new options.

You CAN make a sysimage, has been possible for a long time, just now it will be smaller (not small, while there are some tricks to get even smaller) and (more) obfuscated, with the new non-default options. Pointer to some docs: Creating a sysimage for fast plotting with Plots.jl · PackageCompiler

Now the catch is you compile, e.g. to a sysimage (I don’t think you can compile individual modules yet, but shouldn’t be needed, you just compile all of your modules/code to a sysimage), your binary executable code is no longer portable between architectures or operating systems. This doesn’t provide any speed advantage (except for only lowering so startup overhead). I think this is why it’s relatively not much used, at least by me. Sysimages are used by many locally (still I think a small fraction of users, I can understand given a learning curve/downsides and not much upside). Maybe the new options change things.

You can already call Julia from Python (or other direction, also both for Java, R and many other languages). That seems like an orthogonal issue, so if you get Julia-only to work first, then calling from e.g. Python will follow.

I think I would know if this is much used, or if (a lots of) docs relating to this were available. I just think this is still too new for much docs to exists?! I guess you can help making some, experiment, new uses make the best docs!

[Note, Python is often stated to be interpreted (thus slow), but it’s actually compiled, by default. However, it’s just compiled to some bytecode (still slow, but portable, or at least as portable as the source code), similar to Java being compiled to JVM bytecode (portable and fast, just not as fast as Julia/C/C++). Since Julia is compiled to by default LLVM bitcode (still portable, you have it in installed packages/modules in the .julia folder), but then optionally all the way to x86 or ARM code etc. (giving the speed, since no longer bitcode or bytecode, no longer portable, just as with C and C++, or such code modules called from Python. There are some possible workarounds for that, similar to what Python already does, and Julia with JLLs while none that I’ve yet heard implemented for pure Julia code.]

Hi Palli, thanks a lot for the detailed reply. After reading it several times, i am not quite sure, if it directly answers my question. But, here is my best guess (steps involved). Perhaps you could confirm, if my interpretation is correct or not.

  1. Create a Julia module (i.e. MyModule)
module MyModule
function my_print
println("This is my module")
end
end
  1. Create a system image using PackageCompiler
using PackageCompiler
create_sysimage(["MyModule"], sysimage_path="sys_mymodule.so", precompile_execution_file="precompile_mymodule.jl")
  1. Import Julia module in Python
from julia import MyModule
MyModule.my_print()

I would assume, julia then contains the precompiled MyModule package and the source code is not directly viewable. Is it correct?

It seems you’re on the right path (I didn’t try, I’m assuming this worked for you).

What’s missing for you is invoking somehow, it seems both (or I see now all tree) of these Julia 1.8 options from julia/NEWS.md at master · JuliaLang/julia · GitHub

Command-line option changes

  • New option --strip-metadata to remove docstrings, source location information, and local variable names when building a system image ([#42513]).
  • New option --strip-ir to remove the compiler’s IR (intermediate representation) of source code when building a system image. The resulting image will only work if --compile=all is used, or if all needed code is precompiled ([#42925]).

To add them this seem to be the option you need: References · PackageCompiler

  • sysimage_build_args::Cmd: A set of command line options that is used in the Julia process building the sysimage, for example -O1 --check-bounds=yes.

But even as is, might be ok, depending how paranoid you are.

Yes, I believe none of the (Julia) source code is directly accessible already (or at least not the full Julia source code), even if you don’t add any of those options. What you have is LLVM bitcode (aka “compiler’s IR (intermediate representation)”), unless you strip it, plus machine code and either could theoretically be decompiled. Just as C or C++ machine code can be decomplied, with tools (or manually), Julia or any other compiled language can.

There is no specific Julia decompiler available, that I know of. I think I would know of such a tool. However, it could be made later, at any point in time (and some programmers are good with reading machine code/assembly and “decompiling”/inferring manually). You don’t want to be leaving clues like “local variable names” around, that you can strip out. If you decompile, you would get some cryptic made-up variable names, at best, unless you have great expectations for artificial intelligence (it’s getting good at making up code, and converting between some programming languages, just like ok for translating natural languages, so while I’ve not seen from machine code, I wouldn’t rule out later).

I sould also want method (i.e. “function”) names to be stripped out too, but it’s NOT clear from the (Julia 1.8) docs above, that it’s done. It’s possible it’s done for non-exported methods, I wouldn’t rule out it’s done, but wouldn’t bet on it. You could run Unix strings command on the sysimage, to see if you find something interesting like that. If those two (and the docstrings) are absent, I think you’re as obfuscated as you can expect possible (for any other language).

At least the (one) “entry-point”, i.e. the method my_printand the module MyModule, need to be not stripped, or your code wouldn’t work (from Python) because of MyModule.my_print() so I find it likely that none of the module or method names are stripped, until proven otherwise. This would be unlike C/C++ where even main shouldn’t need to be part of the executable. It’s plausible that non-exported methods names are not there, or at least could be stripped.

Even with the info there, I believe it’s not simple to get source code back, i.e. an equivalent, less readable one. The exact same would be impossible.

FYI: older discussion (one of my old comment from pre-discourse time, there as first answer under my full name, here I use my nickname):
https://groups.google.com/g/julia-users/c/13-1_c64KIc

It’s extremely easy to decompile. LLVM for a long time shipped with a CBackend that would convert LLVM into C.

I would really like to see that… decompiling Julia and getting C code. :slight_smile: I wouldn’t rule out the possibility, since often Julia generates identical code (thus as optimized) code as C or C++. You would likely see some boilerplate code for allocation (should for C too, as there it’s explicit). Maybe a little less code, since Julia code isn’t littered with deallocation (“free” in C or “delete” in C++), so that code wouldn’t be as easily usable (GC code takes care of that).