Not only for technical computing: changing the narrative around the usecase for Julia

@jameson or @Keno Does Julia’s static compilation size reflect a fundamental limitation of Julia’s design or current compiler (ie GC and allocating code style in base or MD with generic code)?. Or can it be improved without a an extensive overhaul of the language or code? Perhaps by restricting input types?

I think this is considered insignificant these days on modern desktops/laptops.

Effectively, you are suggesting substituting hardware (storage space), which is super-cheap, with developer effort, which is very expensive and the bottleneck for many things.

2 Likes

Multiply 45.4 MiB (for LLVM) by the number of Julia users, and that’s closer to the number you can compare to the cost of developer effort if you want to go down that road. The reality is that most of the patches Julia carries for LLVM are bugs in LLVM, so the real total utility of upstreaming them isn’t even localized to only Julia users but to all users of projects that depend on LLVM. In any case, a good deal of the effort involved is on the part of the LLVM team to review said patches. When I last checked several months ago, there were several patches Julia had sent upstream that hadn’t even been looked at yet.

2 Likes

I think this is considered insignificant these days on modern desktops/laptops.

strong statement. I have to say that @non-Jedi brings up some important points why Julia is currently not yet a drop-in replacement for a variety of tasks. This does not mean that Julia is not a very nice general purpose programming language, but if I have to recommend Julia to other users (which the OP is kind of about) I would not just bring up the positive things but also the negative things. We have a 30 user setup with 30*3GB (.julia folders) = 900 GB storage just for Julia packages on a single computer.

My personal reason to not recommend Julia for non-numeric applications is primarily the compile-time. Further it seems the GUI programming is not a prime focus of the community, which means that Gtk.jl is by far not yet where the Python bindings are. For numerical computing Julia is much better than its competitors.

2 Likes

Julia devs have done a good job in the past of pushing LLVM patches upstream, but it would be wonderful if sometime in the next few releases, Julia could depend on an unpatched LLVM.

Sure, that would be great. Let’s hope the next LLVM version doesn’t have any bugs. Seriously, though, Julia uses the LLVM toolchain much harder than most projects. And it’s this intense level of compiler tech that make Julia so fast and capable. I don’t anticipate Julia being able to use a vanilla LLVM version until we stop pushing so hard on compiler tech, which I don’t really see happening anytime soon. Most of the projects in the compiler work thread (that everyone is so excited about) are likely to uncover LLVM bugs that will require patches until proper fixes can be upstreamed. By comparison, the languages you’re comparing Julia are extremely conservative with the language tech they use. Guido van Rossum has repeatedly rejected proposals to add “fancy language techniques” to CPython, even if they provided significant speedups.

Along that note, the package manager and packaging ecosystem isn’t currently very friendly to system Julia installations. My ~/.julia folder is currently 2.8 GiB.

You must have a lot of stuff installed! Mine is only 60 MB. Have you tried doing pkg> gc? That will clean up any package versions that are no longer referenced by any manifest file.

As far as I know there’s no good way currently to provide system installation of common packages that can be shared between users.

There absolutely is. Your default DEPOT_PATH will include something like /usr/share/julia and /usr/local/share/julia. These are where shared installations should go. Any packages installed there with the right permissions will be usable by any user with those depots in their depot path.

Because Julia doesn’t come packaged on most distributions, any software written in Julia must be distributed alongside the Julia runtime. PackageCompiler is great for this if you can get it to work with your software, but it still needs polish, and the binaries it produces are still rather heavy. Other solutions involving bringing along a whole Julia installation are of course even more heavyweight, and then you’re also stuck with the often very large pause of compilation time at invocation.

This is all quite true. The way forward is to improve PackageCompiler and being able to generate leaner standalone binaries for Julia programs. Of course, that will probably require carrying a bunch of LLVM patches around :man_shrugging:

6 Likes

8 posts were split to a new topic: Side discussion on LLVM C backend

Thanks for the this, I’ll have to play around with it :slight_smile: . Admittedly, there’s still a lot of capabilities of the new Pkg that I still need to explore.

I’m very much in favor of the fancy compiler roadmap and get that it’s not amenable to a stable upstream LLVM (but let me hold on to my vain hope that LLVM will happen to be relatively bug-free in the areas y’all are pushing it). I still think it hurts adoption some (while probably helping it in other areas) and hurts even more for Julia being perceived as a language with applications outside of scientific computing. There’s always tradeoffs.

As for why my ~/.julia is so large, after running ]gc, it’s still 2.5 GiB: 423 MiB from GDAL.jl which mostly comes from the library it’s wrapping, 141 MiB from CMake (looks like there’s two versions there), 175 MiB from DifferentialEquations.jl, 115.8 MiB from Plots.jl, and the rest are relatively small. Frankly given the sizes I’m seeing on DifferentialEquations.jl, Plots.jl, and CMake.jl, it’s amazing to me that you’re at only 60 MB.

!!! After running ]gc my usage went from 25GB to a svelte 24GB. I assumed everyone’s .julia tree was huge. That includes 12GB for Julia 0.6, which I still use a bit (Julia 0.6, 0.7, to find replacements for deprecations, syntax changes, etc.) 6GB in packages. 2.4 for Conda, 800 MB in dev…

!!!
See, predev (some dev cruft) is the first subdirectory under 60 MB:

❯ du -h --max-depth=1 .julia | sort -h -r
23G	.julia
12G	.julia/v0.6
6.0G	.julia/packages
2.4G	.julia/conda
1.3G	.julia/.cache
769M	.julia/dev
291M	.julia/clones
256M	.julia/lib
241M	.julia/v0.7
206M	.julia/compiled
176M	.julia/data
78M	.julia/registries
6.8M	.julia/predev
...

I’ve been doing a ton of data crunching with heavy emphasis on reading and parsing of text files which are 10s of MBytes. I even finally got around to using DataFrames for the first time (yes - it’s great).

Julia is working really well for my application.

edit: grammar and spelling is hard…

2 Likes

I propose to think about these clever words before doing blog post:

1 Like

Mine has “only” 1.7Gb and I care to manually clean many of the source files left over by the installers. GR behaves nicely and removes all non-needed packaged files after install but many other do not do it.

Disk space is not negligible as some pretend. Try to work on a laptop with a SSD and see.

2 Likes

Julia’s Cmd is awesome. It’s very exciting that you are planning to turn it into a mini shell DSL. I also think that subprocess API can be improved further by stealing some high-level APIs from Python subprocess because some of them are actually nice. For example, sending data to subprocess stdin and reading stdout at the same time without a deadlock is somewhat non-trivial in Julia at the moment while Python has Popen.communicate exactly for that. Also, relatively new Python has subprocess.run which makes it possible to do this in one line.

5 Likes

We somehow need to allow repo cleans. DifferentialEquations.jl being so large is crazy unnecessary since all of the things that made it large are no longer in the repo.

1 Like

The new package manager does not need to clone packages so I’m not sure why this would be an issue anymore.

2 Likes

Very cool! My .julia folder went from 14 GB down to 6 GB!

Good reminder. That’s one of the reasons I want to get more programmers who work on different kinds of problems involved. Traditionally, (and perhaps this is changing) scientist aren’t know for writing generalized, reusable code. If the package ecosystem is to flourish, programmers will be helpful to have along.

You’re discovering the dirty secret that I don’t actually use Julia for anything because my job is to develop Julia, not to use it. But there’s also the fact that in 1.0 it’s so easy to reproduce a set of packages exactly, so why bother keeping them installed? If I need to run something I can just instantiate its manifest and I’ll be back in a working state in a few minutes tops.

3 Likes

I don’t think that people explicitly choose to keep things installed, rather cruft just builds up. I guess one could clean it up periodically, but as long as hard disk space is cheaper than labor, it’s only worth it in egregious cases.

I was surprised by

$ du -sh ~/.julia/conda/
1.4G    /home/tamas/.julia/conda/

but I guess that’s needed by Plots.jl.

It’s needed for the Jupyter Notebook, not for Plots.jl (except for pyplot).

2 Likes

will it continue to be needed for jupyter though?