Could Someone Explain To Me the Desire for Small Julia Binaries?

I’m not aware of “current efforts” in the direction I outlined. The existing StaticCompiler might be able to do what you’re hoping, and in my link I outlined a way that may greatly simplify StaticCompiler and probably make it possible to support the full language. The proposal is all based on changes that are already implemented in Julia 1.9. But I don’t know of anyone who’s actually working on doing that.

2 Likes

If you want to create binaries for a generic target (e.g. raw x86_64 without extensions like AVX512 or such), then yes, that’s already possible(ish), as long as you generate that code on a platform with enough matching assumptions about fiddly bits like pointer sizes and so on. So if you’re on a desktop supporting AVX512, targetting raw x86_64 ought not to be an issue at all. However, already targetting x86 (i.e. 32 bit) from your x86_64 host can run into issues, like sizeof(Int) of the host platform no longer corresponding to sizeof(Ptr{Any}) of the target platform (which we currently assume to be true, which is hard to fix). This is a problem, because pretty much any code accessing arrays or strings or almost anything using unsafe_* at all handles pointers at some point and breaks with this.

That grows in complexity very quickly, unfortunately, due to the large number of different microarchitectures in use today. The safest way to do this is to have a serverfarm with the most “up to date” CPUs for a given architecture, so that compiling to a subset of the CPU features supported in the farm is ok. The problem with that approach though is the amount of hardware you go through with each new CPU generation, which won’t be cheap at all :frowning: The solution to that is to have a compiler capable of targetting a different microarchitecture, without itself running the compiled code locally (interpreting is fine though! You don’t get Illegal Instruction there) and then deploy that artifact to your target machine. That’s cross compilation in a nutshell.

I really don’t know if there’s much to be had of an easy win here :person_shrugging: It’s a tricky problem and the number of questions about this seem to increase, so I’d say there’s definitely demand for this, and Pkgimages don’t really solve the deployment issue. Those are, as far as I know, all about caching native code for local use, they’re not about compiling locally and deploying the cache to a server somewhere, as I understand it. To then take advantage of the target features means either compiling on the server itself (infeasible in lots of situations, which is why people build Docker images & bake the caches into them for deployment in the first place) or compile for the architecture of the server locally and then deploy to the target, also known as cross compilation.

Hence, my position is to prefer going straight for the big goal, because from what I can tell (and maybe I’m totally wrong here) in order to solve the actual experienced painpoints, proper cross compilation is needed. I definitely don’t mean to say that the hard problem stands in the way of the easy one here (you can use PkgCompiler and StaticCompiler today to do just what you describe after all, with the caveats from above), but that it ought to be taken under consideration when more effort is spent in that direction, due to the similarity of the requirements.

2 Likes

Embedded systems were mentioned before, and that requires cross-compilation, right?

My bigger question is what and where these assumptions about running the code on the same machine as the compiler are. My completely uninformed impression is that Julia is compiled to LLVM, which can be further compiled for a variety of architectures, so cross-compilation seems like LLVM’s job rather than Julia’s. That impression is iffy already because I know of GPUCompiler’s existence, which sounded like the work was being done in Julia, not LLVM.

Here’s one: when you encounter Sys.iswindows(), is that a compile time query or a runtime query?

if Sys.iswindows()
    include(raw"C:\more\code\to\compile.jl")
end

is relevant for compilation time (it might be just filesystem navigation for the purpose of finding the source files), but

function __init__()
    if Sys.iswindows()
        ccall(...)
    else
        ccall(...)
    end
end

is surely a runtime query.

When compile time and runtime are on the same platform, it doesn’t matter. But when they differ, it does.

7 Likes

Yes, that’s a good example. It’s not impossible to solve, e.g. we could have an AbstractInterpreter that takes these things into account when “evaluating” them and either give a warning or insert appropriate runtime checks. The easier way is to not try to statically compile such code, which is ultimately the same reason PkgImages aren’t really portable between machines - AFAIK, such code executed in a global scope is just baked into the PkgImage (or rather, its result is). Yet another way is to just take a module after parsing & such initial, top-level execution and just serialize the resulting state to disk, ignoring any potential mismatches due to such calls (this is by far the most dangerous option).

My gut feel is currently leaning towards a mix between the first & last option, but time will tell which design is ultimately right/has the least downsides/the behavior most acceptable in this context.

I don’t care about “small” shared libraries. I’ll be happy if it becomes truly convenient to produce hundred-megabyte sized shared libraries to be used from C/C++

I’ve run into strange bugs when calling Julia code as a shared library from C/C++. I may be wrong but eventually I concluded that it was unsafe to mix std::iostream with Julia’s print methods in the same application. Running jl_init() seems to do something funny to the console interaction and IO buffers.

Please do file an issue about such reproducible bugs, so that they can be fixed :slight_smile: Not to derail the thread, but julia uses libuv internally for printing & IO abstraction, so any issue with julia ought to occur with pure libuv as well.

1 Like

Does the environment variable JULIA_CPU_TARGET affect package images as well? This would at least allow potential cross compilation between microarchtectures on the same platform I think.

Thanks for the advice. My problem was described in this post. Let me know if there’s something obvious that I missed. Otherwise, I’ll go on to file a github issue.

If my C/C++ application already uses stdio / iostream for IO, maybe it’s problematic to link with any other library (julia.h or otherwise) which loads libuv under the hood? Sorry to hijack the thread further.

neat

22 Likes

This is extremely important to make Julia viable as a scripting language. Of course, one would say it is not the right tool for the job, but why not use your favourite language for everything? :slight_smile:

For that end, I am probably more keen on having small memory footprints than small binaries, but the two comes hand in hand, as those binaries has to be read into somewhere - and then it will be used of course.

To give an example, I run a small Julia script to record my home automation states. It feels quite a big waste. Julia does not even start below 100-200 MB of RAM usage, and adding a few packages grows your sysimage to a multiple of that in no time. Do it larger scale, and even a powerful desktop PC will beg for mercy.

For comparison, a Python script could do the same with 8-20MB, and Python is not the best competitor in that space. But I would be happy with something similar.

1 Like

Focusing on the RAM usage, disabling the compiler with --compile=no may help if you can precompile to native code in Julia 1.9. Then you remove LLVM and save disk space as well.

Maybe this already possible on some level with StaticTools.jl.

1 Like

@tim.holy , not sure to understand. Do you mean that Julia is currently looking for developers who can make the small binaries/dll/so a reality? Haven’t the Julia core team got the skills to implement them? What is missing for these features to be available? Funding?

Core developer time is always in short supply, and funding for particular projects can indeed speed things along. But that’s not what I was referring to; in this case, it was about their impact on the package ecosystem as described in the next sentences:

Not only are they often amazingly talented, but typically they know at least a niche in their current world very well, and might notice (and fix) deficiencies in the comparable Julia offerings.

2 Likes

I really like reading your comments and having point of view. In your view, which Julia deficiencies do you think need to be addressed or which features are lacking in order for Julia to see an exponential rate of adoption?

It may be a naive question, but given that virtual machines can already translate instruction sets (I’m thinking of Apple M1 running a Windows 10 for x86_64 in a VM, for example), wouldn’t it be just possible to “local compile” on a server farm with loads of various VMs? Thus, getting back to the “othogonality” mentionned by Tim Holy.

If not, please enlighten/correct me :slight_smile:

In terms of H/W cost, this would be “minimal”, in the sense that it wouldn’t add too much overhead. As for S/W, just a few Windows licences. Main cost would be the setup and administration.

Apple has a big advantage - they control all ends of the (hardware) equation. That is to say, they know both exactly which CPU is running on an M1, as well as what CPU is running on their x86_64 machines. This is reflected in the type of apps that are supported to run under Rosetta (2) - those that are originally compiled for an x86_64 Apple device. Notably, x86_64 virtualization (e.g. a VM) of different CPUs is not supported:

  • Virtual Machine apps that virtualize x86_64 computer platforms

Rosetta translates all x86_64 instructions, but it doesn’t support the execution of some newer instruction sets and processor features, such as AVX, AVX2, and AVX512 vector instructions. If you include these newer instructions in your code, execute them only after verifying that they are available. For example, to determine if AVX512 vector instructions are available, use the sysctlbyname function to check the hw.optional.avx512f attribute.

Unfortunately, those AVX, AVX2 and AVX512 instructions are exactly the kinds of instructions you’ll want to use in a native binary that expects to run on a CPU supporting these - any kind of emulation of them (if they were supported) would likely be slower than not trying to emulate them in the first place.

This is just one example of course - in general, virtualizing a CPU means virtualizing a CPU that has a subset of the features your real host CPU has, and emulating different instructions (or a different architecture entirely) is usually too expensive computationally to be worth it (at least for the types of systems that are of interest for server deployment - for a long/fun afternoon, you can read up on emulation of the PlayStation 3, released in 2006, which only recently BARELY became emulatable on modern hardware!). Add on to that the fact that compilation is usually a pretty demanding task (some even use compilation of Google Chromium as a CPU benchmark…) and you get a recipe for “not economically worth it”.

And after all of that tremendous amount of work, you’re still no closer to just deploying a docker container to your production server and compiling locally there - worse yet, you now have two serverfarms you need to run (not to mention that you likely don’t want to crowdshare the hardware for security/economic reasons…), your compiler farm and your deployment environment, effectively doubling the cost of your hardware. If you then upgrade your deploy environment to a beefier CPU, you can’t really emulate the new features effectively, so you have to upgrade your compile environment in lockstep, adding to the cost…

4 Likes

So, I am coming back to add another thought to this question! In my experience, when working with air-gapped systems, it would be very convenient to have small binaries that are generated in a development environment or server and then deployed into a production/live environment. It makes things a lot simpler and helps with moving runtimes quickly between environments.

1 Like

Then I welcome you to the group of people that want cross compilation, as described above :slight_smile: The “small” part of “small julia binaries” is of course a bit of a flexible target.

2 Likes