Julia memory overhead

I have received some queries from coworkers about Julia’s memory overhead. Members of our HPC team have expressed concern that Julia seems to eat about 160-200MB of RAM per Julia session (our current deployment model is to run many single-thread jobs on AWS nodes with relatively small RAM, and additional RAM is an expensive resource on AWS. The speed and RAM usage of my algorithm itself seems quite reasonable, but fitting 200MB more data in could be valuable to us).

So I have a couple of questions, which might be answered elsewhere but I couldn’t find an up-to-date answer.

  1. When I start julia it seems to use ~ 160MB of RAM
$ /usr/bin/time -v julia -e 1
	Command being timed: "julia -e 1"
	User time (seconds): 0.22
	System time (seconds): 0.16
	Maximum resident set size (kbytes): 158612

Does someone know what a breakdown of this 160MB is? We have LLVM, BLAS, LAPACK, libgit, etc as well as the compiler, system image, etc. I found a post the other day for an older version of Julia that was talking about 38MB of RAM usage - what has changed since then?

  1. Is there some way of cutting this back? For instance, if I want to deploy something that doesn’t use BLAS and LAPACK, could I make a Julia build without them? Alternatively, are there any environment variables I can set that affects the amount of RAM used? (e.g. I read somewhere that OpenBLAS allocates a lot of RAM on linux because it tries to allocate scratch space for many threads, so setting it to 1 thread might help?)

This is getting a lot of likes but no answers, suggesting it’s a common question. If you don’t get a response here, maybe open an issue on github? Not that github issues are meant for Q&A, but I think Julia having a huge memory footprint is a legitimate issue. This has affected me; I wanted to use Julia on an AWS instance with 2GB memory, but one master process + 4 workers ate up about 2/3 the memory on the system, leaving little extra.


The solution to this would be making julia base more modular, which is the plan of https://github.com/JuliaLang/julia/issues/5155

Back over a year ago, I had a branch of v0.4 (https://github.com/ScottPJones/julia/tree/spj/lite) that allowed you to build Julia removing various parts, even up to removing docs, help, and the REPL (which use up a lot of memory, and are not needed to run a non-interactive script). It also made building for the Raspberry Pi from scratch not a 24 hour task.

The biggest problem I had when trying to do this, was the way that so much stuff seemed to be randomly dumped into base/docs/helpdb/Base.jl. At the time I started doing this project, it was still months before v0.4 was released, I didn’t know how to make sections of Base.jl only compile depending on my Make.user switches (plain if ... end didn’t work), and the @static macro didn’t exist yet.
Since then, @kslimes esp. has been making great strides cleaning that out, and I think I can use the @static macro to handle the rest (or move even more stuff out of Base.jl and see if it can be merged)

The other issues I ran into that I recall were:

  • Dealing with cutting out unit tests, since in many cases, unit tests from different areas are lumped together (which is why I’d made PRs to split the unit tests up the same way the source files were split up, esp. for strings)
  • Dealing with interactions between different pieces I made optional (such as the many places that BigInt and BigFloat are used, i.e. with Complex, with Rational, etc.)
  • Some platform specific issues, for example where the implementation of Int128 / UInt128 for 32-bit platforms depends on BigInt.
  • I didn’t know at the time how to conditionally remove the binary dependencies completely from the builds
    (I did get most of it, so that it didn’t need to build BLAS, LAPACK, GMP, MPFR and some others, but I think I hadn’t managed to completely clean out all of the unneeded stuff.

Once v0.6 is at least in a RC stage, I intend to redo that branch, and then you could use that.
It did make a substantial difference in the amount of RAM needed per process (I think about half was needed on startup) I’d described it back on the old Google Groups (I think Julia-Dev), I believe I listed out the differences in memory usage that I saw.


Thanks guys for the responses. I thought this might still be a work-in-progress, so it seems likely to be better in the future. I look forward to seeing your v0.6 branch, Scott.

And yes, the code in Base isn’t quite as modular as I imagined it would be. I have been working my way through LinAlg lately, where, for instance, the BLAS overloads of A_mul_B! etc are defined in LinAlg instead of the BLAS module… I’m guessing this code is quite old, I dunno. And I hadn’t thought much about the unit tests, but Scott you make a good point, these are worth factoring correctly too.

So is anyone aware of any quick wins that work with v0.5 (build flags, using reference BLAS, etc), or is it basically a lot of work to make any progress on memory overhead?

Quick and dirty splitting off BLAS should be doable by commenting things out.