I’ve seen this notion pop-up repeatedly about wanting small binaries created within the Julia ecosystem. I personally have been quite confused about this desire due to probably not working in a space where having small binaries is desired. My mental oversimplification of this problem is that as personal computer storage space has gotten so large, having big binaries is not a problem any more – as I see it.
But I must be missing something crucial.
Could anyone explain to me why we want small binaries within the Julia ecosystem in terms of what benefits “people” are looking for, why having small binaries is a selling point for a programming language ecosystem, and how this could impact the daily Julian?
Thanks!
~ tcp
P.S. I meant to say that “people” is me vaguely referring to my impression of the Julia Community writ large as well as other programming language ecosystems.
Personal computer storage space is not the issue – people often operate on shared machines where there is competition for disk space, I/O, etc. In those environments, there’s a lot of value in Julia not introducing a large regression in binary size over competitors.
I don’t think Julia has big penetration in embedded systems yet, but binary size there is a dominant factor in what’s feasible at all.
One of my goals is to implement a full attitude and orbit control subsystem (AOCS) for a satellite using Julia. In this case, a small binary is necessary. Space grade memory is very expensive, and I would love the ability to compile a Julia code to executable and run inside a very minimal Linux installation.
The other option is to run linux and have the entire Julia setup. However, this behavior will require a much more capable computer. Probably we will fly a COTS and will be currently restricted to CubeSats. This kind of AOCS in “bigger” missions will take a while if we do not have this small binary.
We do use Julia in embedded systems (underwater modems), and having recompiled binaries is very desirable as compilation not only makes start up time long, but also leads to timeouts during boot up.
When distributing firmware updates to users, smaller binaries are desirable - especially because the devices are often out at sea and in inaccessible places where connectivity isn’t fantastic.
A related issue, the ability to create small shared libraries, would be a win for both Julia and languages like Python, R, etc: folks could start using Julia instead of C to write the code that isn’t fast enough in the native language.
Think of the case you have a big production code written in C / C++.
You need to add modules to it, some functionality.
You do that by adding libraries to be linked with implemented functions.
Being able to generate those libraries with Julia means that you can do the work quickly while not requiring moving the whole project into a new language.
Moreover, sometimes the project is handled by a software team while some functionality is added by the algorithm team. It means people without C/C++ knowledge can collaborate within the project.
This is something very common in the industry.
One popular solution for this, at the moment, is MATLAB Coder.
I think Julia has a great use case here. It will win this market once it can generate those static / dynamic libraries. By the way, Windows support is needed here as well.
At least at juliacon 2018 basically everyone running Julia in production was doing it by pushing a docker image onto an AWS service, like Batch, or Fargate or Lambda, or similar cloud computing enviroment.
That means you have to move the image up and down each time to update it.
Which you might do a lot if you are debugging something that only occurs in the cloud enviroment.
Moving a multi-GB image every time you want to make a change is fine if you are somewhere with 1GB/s fibre, but a lot of the world doesn’t have that. In the UK I was lucky if I got 40MB/s, in Australia i am lucky if i get 5MB/s.
Hang on. So Julia could be the no-brainer number 2 when other languages have a two-language problem, and small binaries can unlock this potential? That might just be the disruptive change we need to make Julia adoption grow exponentially instead of current linear trends. Is there any way someone useless like me (who knows very little about binaries, compiled libraries, LLVM or deep Julia internals) can help make small binaries happen?
I’m hoping 1.9 will already be that change (though perhaps it won’t be enough until package load times improve too), but yes, I do think it could be a big boost for Julia adoption. Among all the changes we could make, it might have a distinctive impact: because it’s often the most talented developers in an ecosystem who write the libraries that everything else relies on, introducing them to Julia could be a targeted recruiting opportunity for a very special group of developers. Not only are they often amazingly talented, but typically they know at least a niche in their current world very well, and might notice (and fix) deficiencies in the comparable Julia offerings.
Maybe due to the IO. Smaller data file less IO needed, faster and less errs, decoding after reading into memory, abt 10x efficiency. But more errs of data file corrupted due to the smaller files without recovery information in head or tail of the files. Lots of people never check the time-consuming “unzip”. Whatever, reusing data files smaller is better with No Err, otherwise meaningless.
Hey folks! Thank you so much for the explanations! Just to summarize what I’ve learned here about the desire for small binaries is that small binaries:
Support easier distribution of Julia programs or tools through package managers (per @lmiq )
Enable usage in contexts where storage space is an immensely high premium (thanks for the awesome example @Ronis_BR regarding CubeSats and @johnmyleswhite’s point about embedded systems)
Minimizing downtime due to maintenance/updates (per @mchitre regarding underwater modems [super cool!])
Julia could migrate to being a “sane” high performance backend for other ecosystems to wrap around, such as Python or R packages, rather than C/C++ (I’ve tried this @tim.holy with some collaborators and it has worked to some extent – you are definitely right)
Simplifying/easing the ability to work with containerized environments involving Julia (really terrific point @oxinabox and @adienes – I don’t work with containers so much so this is a powerful point)
Shared environments to not encounter regressions in storage management (I run into this problem @johnmyleswhite
I think these are all terrific points and some that I knew about but others I haven’t really given much thought to. Thank you so much for sharing your ideas to help me centralize my thinking! I knew that small binaries were a good thing, but I just couldn’t clearly articulate the “why” to myself. You all have helped convince me even further!
Feel free to share more ideas/examples about why this is important for the Julia Community, but I think I will mark this summary as the answer for now. Depending on time, I may come back to it to update the summarization as well.
Not a brand-new reason - more of a shared experience - but more often than not, I quickly made some simple scripts for my job, which couldn’t be easily shared to my fellow non-IT coworkers. This relates to tool distribution, but within an office. PackageCompiler.jl can generate executables, but which ~200Mo at a bare mimimum, this is no an option when coding “many” small helper programs.
Other reason - in link with containers - is that this would open wide the world of microservices (I think it’s a keyword here, not mentioned above).
One aspect that hasn’t been mentioned so far, and that to me is a reason why I don’t think it’s just 30-1000 LOC change with pkgimages like @tim.holy thinks, is cross compilation. In essence, cross compilation means compiling a program for an architecture other than the one where the compiler is running. Currently, all julia compilation happens under the assumption that the code will run on the same machine as the compiler itself - PkgCompiler and StaticCompiler weaken that assumption a bit, to only assume the same computer architecture (i.e. x86_64 or ARM for example). In order to compile for e.g. Mac M1, we need a bootstrapped julia install capable of running on that M1, to compile a sysimage and subsequently packages & pkgimages.
GPUCompiler breaks that assumption a bit, since we don’t actually have a julia runtime & compiler running on the GPU itself, it’s all compiled on the CPU and then sent to the GPU for execution. In fact, GPUCompiler goes to great lengths to remove remaining bits of those assumptions from the LLVM IR julia spits out. This ~mostly works, but you can very easily run into issues with pointersizes and some other assumptions that happen in the compilation pipeline that make actual crosscompilation (currently) infeasible (some experiments to the contrary not withstanding).
Now, you may say that static compilation is easier or different to cross compilation and thus should be easier, and you’d be right, but only to an extent - the compiler still assumes that the final artifact will run on the same machine during codegen and optimization, and thus targetting a different microarchitecture for e.g. deployment (think a server that has a different x86 CPU with different features than your development machine) is shaky at best. Static compilation to the host system is thus a special case of cross compilation and I think it would be a mistake to first tackle the special case here in julia itself without planning ahead on how to get cross compilation (which, truth be told, I very much see as the desirable endgame).
I do have a list of some of the issues I encountered so far in playing around in this domain, but this post is unfortunately too small to hold it
Would the current efforts allow, someday, to include compiled julia code in R packages with an interface for the programmer that looks like what Rcpp does ? That would be terrific. The current possibilities with JuliaCall, although they have the merit to even exist, are not as practical…
So I assume RCall.jl is used. I’m not sure but I think it allows calling R in the same process. R is GPL, so that means your whole program is GPL. If you’re trying to get out of that, then Rcpp doesn’t help you either. So what’s the point of (small) binaries for you? [It’s already a problem to distribute GPL-free R-only binaries and/or small R binaries, seem to be 77 MB+.]
In some few cases (small) binaries can help, but in very many cases distributing source code is ok (and/or sysimage), and startup as fast. And the source code can be way smaller than the binary…
That doesn’t seem like a problem. You would do same as with Python, you distribute the (likely smaller) source code, and let the package depend on julia (rather large but one-time shared cost) runtime (apt or snap or whatever) package…
Again not a problem (e.g. do most Python users compile in that setting?). It seems like a lot of the fixed overhead (of the runtime and LLVM) could be shared across containers? Startup speed doesn’t need to be large, even with source code (assuming precompiled), getting better in 1.9, or if you can compile your source code (partially or fully) to a sysimage.
Yeah, cross compilation is a beast, and my estimate of difficulty was not including it. I don’t understand why that would need to block progress on generating small libraries/binaries, though; to me they seem like somewhat orthogonal problems, and it seems possible sidestep the need for cross compilation just by having enough build servers of different platform types. So personally I wouldn’t let the hard problem stand in the way of a win on the comparatively easy problem.